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Prefatory Statement 


Previous EDITIONS of this book have proven to be dependable and 
welcome instruments of instruction and guidance in the field of 
educational tests and measurements, For over twenty years many 
hundreds of young teachers have obtained their first grasp of the 
problems and possibilities of measurement and evaluation from the 
pages of the predecessors of this volume. But even a timely and 
successful professional book requires revision. 

This book, as was true of its predecessor, is designed especially for 
the use of secondary-school teachers and students of secondary 
education. Essentially it is a completely revised treatment of the 
authors’ earlier volume which appeared in 1943 under a similar title. 
It continues to present the practical introductory discussion of the 
essential principles of measurement and evaluation which students 
and teachers in general find readable and valuable. Certain recent 
and significant changes in points of view and in methods and tech- 
niques of measurement, as well as an obvious lack of timeliness in 
certain of the illustrative materials in the earlier edition, now serve 
to make a revision desirable. The continued interest of instructors 
and students in the type of treatment presented in the earlier volumes 
has served to encourage the authors in the preparation of this further 
revision. 

The group specifically addressed in this volume is comprised of 
students and teachers whose major interests and responsibilities are 
in the secondary school. A second volume, parallel in general organi- 
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zation and treatment, is just as specifically addressed to those teachers 


and students who primarily face the problems of instruction, measure- 
ment, and evaluation at the elementary-school level. Illustrations, ex- 
amples, and problems are chosen from material of suitable interest 


and concern to the reader. Many of the problems of measurement 
and evaluation are common to both the elementary and the secondary 
levels. The illustrations, however, are more meaningful if chosen from 
fields close to the fields of interest of the students and teachers. The 
present revision of these two volumes brings this treatment of meas- 
urement and evaluation quite up to the best thought and practices 
of 1954. 

As has been true throughout the history of education, the decade 
just past has been marked by numerous highly significant develop- 
ments in curricular points of view, in instructional methods and 
materials, as well as in measurement and evaluation techniques. 
А special effort has been made to retain the effective presentation of 
the whole problem of measurement and evaluation in good propor- 
tion and with the common-sense perspective of the earlier volumes. 
In addition an attempt has been made to broaden the point of view 
reflected in the earlier treatments and to introduce the student to 
easily comprehended discussion of the newest and best evaluative 
techniques that have thus far appeared. In this treatment, special 
emphasis is given to methods and materials designed for the measure- 
ment of intelligence and the evaluation of certain of the more intan- 
gible aspects of the child's personality. Many of the instruments and 
techniques presented here are so new in the field of educational 
measurement that only recently have they proved their dependable 
worth to the practical educator. 

In this revision the authors have continued to place a heavy stress 
on the crucial and practical problems of improving all types of 
teacher-made examinations and tests, By principle and by example, 
the construction, improvement, use, and interpretation of all types 
of evaluative and measuring devices are treated in detail. Extensive 
new material is presented on the measurement of personality and on 
performance tests, evaluative tools and techniques, and graphical 
representation. The simplified treatment of the statistical problems 
of test interpretation presented in the earlier editions is continued 
in this volume. New problems of test interpretation closely related 
to the teacher's actual needs have been prepared. A new revision of 
the workbook designed to accompany this text is also in preparation. 
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This revised volume is planned to provide a complete and system- 
atic handbook for any student or teacher requiring a straightforward 
and understandable discussion of all of the fundamental ideas and 
techniques of evaluation in the classroom. It is written from the 
point of view of the classroom teacher, and at all possible points non- 
technical language is used. In instances in which technical language 
cannot be avoided, such terms are introduced in context, and are 
defined and illustrated. Many words that may lie outside of the ex- 
perience of the reader are included in the Glossary. 

It is believed that classroom teachers, supervisors, and students in 
training for teaching and supervision will find in these pages a care- 
fully written fundamental text on the principles of measurement and 
evaluation in education. It is hoped that a main contribution of this 
revision may be, not so much in the novel points of view or advances 
in the theory or technicalities of test construction as it is in the 
plainness of exposition and balance of treatment of many points 
which to some might otherwise seem over-technical. 'The authors 
themselves think of it as a first book in measurement and evaluation 
for those who at the time of studying it may know very little, if any- 
thing, about measurement in education and its application to the 
problems of improving classroom instruction. This volume offers 
carefully selected suggestions of ways in which measurement and 
evaluative instruments may be effectively used in the teaching of 
high-school pupils. In addition, many general hints are given for the 
guidance of the student and teacher in constructing, selecting, using, 
and interpreting all types of educational tests as valuable aids in 
accomplishing this task. 

Grateful acknowledgments are here expressed to the many experi- 
enced teachers and supervisors, as well as graduate students and 
colleagues, who have contributed directly or indirectly to improve- 
ments in the formulation and statement of much of the material 
incorporated into this volume. The authors are especially indebted 
to the many users of the earlier editions of this text who by their 
friendly and critical comments have stimulated and encouraged the 
development of this volume in its present form. 
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Measurement, Evaluation, 
and the Classroom Teacher 


THE PURPOSE of this chapter is to introduce the reader to the follow- 
ing general notions underlying educational measurement and evalua- 
tion, апа to provide a preview of the contents and organization of 
this book: 

А. Measurement in education not a new idea. 

p. Urgent need for improved measurement and evaluation 

techniques. 

c. Importance of evaluation and measurement to the school 
and to the teacher. 
Characteristics of educational tests. 
Purposes tests do and do not serve. 
General problems of measurement and evaluation. 
Organization of this volume. 
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1 NEED FOR MEASUREMENT AND EVALUATION IN EDUCATION 


Educational measurement not a new idea 


Teachers have always endeavored to measure the results of their 
teaching efforts as indicated by the progress of their pupils toward 
desired educational goals. Many have been equally concerned about 
the need to diagnose and remedy revealed defects in instruction. 
However, only recently has any large degree of accuracy been in- 
jected into their methods of measurement and diagnosis. Measure- 
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ment of progress, evaluation of efficiency of instruction, and the 
accompanying attempts at diagnosis were largely a matter of personal 
observation and judgment on the part of the individual teacher as 
recently as two decades ago. Actually, the recent development of 
modern educational instruments of measurement and evaluation may 
be regarded as an extension and improvement of an old practice. 
The modern educational measuring instrument presents a surpris- 
ingly accurate picture of the course objectives as well as an analysis 
of the underlying skills, knowledges, concepts, understandings, and 
other outcomes upon which accomplishment in different subject- 
matter fields depends. It points out weaknesses in learning and 
instructional procedures. It permits the establishment of specific and 
objective goals of achievement that are based upon the actual 
attainments of children under typical school conditions. Educational 
tests and the information resulting from their use in the classroom 
have come to be almost universally identified with good teaching 
practice. Today the professionally equipped teacher is expected to 
be well versed in their construction, selection, and use in the class- 
room. 


Early recognition of the need for tests in the classroom 


For many years the teacher's estimate was accepted as the sole 
measure of a pupil’s ability or accomplishment. Studies of the 
reliability of such methods gradually cast serious doubts on their 
accuracy, with the result that since that time a continuous search for 
more dependable measures has been carried on. Today the testing 
movement has passed through the first stages of its development. 
Thirty years ago it was necessary to popularize the idea. Now the 
advantages of standardized and informal objective tests are recog- 
nized by most educators and by many laymen. Moreover, the tests 
themselves have been greatly improved in content and in structure 
as a result of the critical analysis and refinement to which they have 
been subjected. The most enthusiastic students of educational tests 
are and shouid be their own severest critics. The specific shortcom- 
ings of tests are coming to be fully realized. They are not mysterious 
instruments for the confusion of the uninitiated, but are useful 
devices for assisting the professionally minded educator to improve 
the conditions under which children learn and teachers teach. If 
they aid in the accomplishment of this, the primary purpose of all 
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school supervision, and for that matter of all of the educative process, 
they are thoroughly justified. 


2 MEANING OF EDUCATIONAL EVALUATION 


Several different attitudes toward the use of educational measure- 
ments in the school have held sway at various times since the 
objective approach to the measurement of pupil intelligence and 
achievement made its appearance shortly after the beginning of the 
twentieth century. These different attitudes or outlooks may be 
called by the following names: (т) testing, (2) measuring, and (3) 
evaluating and appraising. 

The first concept chronologically was that of testing, which con- 
sidered the development of objective devices for testing intelligence 
and achievement of pupils to be of major importance. This attitude 
was doubtless the result of the early need for the development of 
objective instruments, for such instruments were not available in 
any significant quantity for some years after the concept of objec- 
tivity of tests first made its appearance in the field of education. 

When objective tests became fairly numerous and classroom 
teachers began to use objective methods in their own examinations, 
attention turned more toward the use of test results and toward the 
development of instruments for measuring certain of the more elusive 
types of instructional outcomes that do not lend themselves readily 
to objective measurement. This period may be characterized as one 
during which the major approach was that of measuring. 

The quite recent development of the evaluation and appraisal 
concept was doubtless impelled by the increasing realization that 
paper-and-pencil tests can measure only a limited portion of the 
outcomes of instruction and types of pupil behavior about which 
the teacher and other school officers need information. Therefore, 
the present view is that objective tests constitute probably the major 
type of evaluative instruments but that such other means of measure- 
ment as the anecdotal record, the interview, the questionnaire, the 
rating scale, and such tools as the individual pupil profile, the class 
record, the cumulative record, and the case study have a very 
significant place in the evaluation of pupil behavior and achieve- 
ment. The evaluation concept has also doubtless been stimulated by 
the recent attention of educators and psychologists to the whole 
child and his behavior. This tendency to consider the child as a 
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whole, rather than as an individual whose behavior and abilities can 
be catalogued into a number of different compartments, places a 
definite responsibility on the user of tests and other instruments of 
evaluation for considering the child in this broad sense. It is through 
the application of the evaluation concept rather than of the narrower 
concepts of measuring and testing that this result is most effectively 
obtained. 

Perhaps a more exact idea of the meaning of educational tests to 
the classroom teacher may be obtained most readily by considering 
the characteristics that distinguish them from other types of measur- 
ing instruments in education. In the first place, the educational test 
of standardized or semi-standardized form is more limited in its use- 
fulness than the informal objective examination. This is necessarily 
50, since the standardized test must confine itself to the general 
aspects of the subject that can be covered by all classes, while the 
typical examination or objective test prepared by the classroom 
teacher covers a specific selection of the content or activities from 
the teacher's own class. In the second place, the items comprising 
standardized educational tests are commonly constructed and ar- 
ranged in accordance with certain statistical and educational princi- 
ples designed to produce more accurate measuring instruments. In 
the third place, the more useful standardized educational tests have 
been taken by a large sampling of school children under con- 
trolled conditions. From this use of the tests the norms that give 
meaning to test results and permit the interpretation of test scores 
are derived. In the fourth place, the more carefully constructed and 
valuable educational tests yield results that point the teacher's way 
to the application of specific remedial methods where needed. 

Both informal objective tests and. standardized educational tests 
are characterized by other features, such as validity, reliability, and 
objectivity. Validity refers to the truth of the picture of the ability 
or achievement revealed by the test. Reliability refers to the con- 
sistency with which the test reveals this picture. Objectivity refers 
to the extent to which the test results are affected by the personal 
judgment of the user. There are other important characteristics of 
educational tests, but these represent the major ones on which their 
meaning depends. : 
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3 GENERAL CHARACTERISTICS OF EDUCATIONAL TESTS 


Measurement in education 


In most fields of human endeavor the most efficient results are 
obtained when the worker has clearly defined goals toward which to 
work and dependable instruments for determining progress. In fact, 
many critical workers in measurements are seriously questioning the 
defensibility of curricular objectives that are so vague or general 
as to defy measurement. A definite aim enables the worker to direct 
his efforts toward the particular task to be accomplished. By the 
proper use of instruments for measuring results it is possible for the 
worker to know what he has accomplished. Thus a reliable and 
analytic silent reading test will give to the teacher a measure of his 
relative success in developing silent reading skills in his class. Accu- 
rate measuring instruments also aid in discovering when emphasis 
has been misplaced. For example, a pupil who, in his elementary- 
school work, has been given an unusual emphasis on oral reading 
may satisfactorily pronounce words appearing on the printed page, 
but he may be sadly lacking in ability to get meaning from these 
same words. 

Measuring instruments also make it possible for the worker to 
resort to experimental methods and thus.to learn definitely whether 
materials and methods are effective. This is as true in the field of 
teaching as in other fields. Without specific aims the teacher cannot 
plan his work effectively. He cannot know, except in an indefinite 
way, what he is to do. A teacher without specific aims is like a 
person who starts out to walk to a certain place without any idea 
of which direction he is to take or how far away his destination may 
be. If, on the other hand, the goals of instruction are clear-cut and 
accurate, means of determining progress are provided and the prob- 
ability of a timely arrival at the goal is greatly increased. 


What educational tests are 


Modern standardized and informal objective educational tests 
differ in several respects from the typical teacher-made examination 
of the discussion type. In the first place, the exercises used in educa- 
tional tests are often much more carefully selected to coincide with 
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the purpose for which the test is designed than is true of questions 
in the ordinary essay examination. For example, an objective or a 
standardized test having for its purpose the measurement of ability 
to locate the states of the United States contains only such items as 
relate specifically to that purpose. The traditional essay-type exami- 
nation frequently contains questions sampling into many different 
fields. In the second place, the items in a carefully made educational 
test are commonly arranged in accordance with certain principles of 
test construction so as to form an accurate measuring instrument. 
For the present it is enough for the reader to note that there are 
important principles of arrangement of items within a test that 
should be considered. 

In the third place, the standardized test or scale is given to a 
large number of children of varying age and school classification, 
and from these results the norms are obtained. A fourth point of 
great importance, but only recently recognized and applied in con- 
nection with test development, is the fact that test scores yielded by 
narrow-function tests of the unit and analytic types are more 
readily translatable into specific remedial procedures. This means 
that a really valuable test, in addition to giving a cross-section of 
the instructional situation, must break down and identify the under- 
lying skills so specifically that the results may be readily interpreted 
in terms of the specific kind and approximate amount of remedial 
attention needed. 

Although the above discussion relates particularly to standardized 
tests, it should be recognized that the teacher can construct his own 
objective tests to serve purposes for which no suitable standardized 
tests are available or for which teacher-made tests are more satis- 
factory. The standardized test and the teacher-made test supple- 
ment each other, and both are important in a well-rounded testing 
program. Certain other measurement devices of a non-test type also 
have great significance in the evaluation of child behavior and of 
the results of classroom instruction. 


What objective tests do 


Every alert teacher has at some time earnestly desired to know 
whether his instruction in certain school subjects was particularly 
superior or inferior, effective or ineffective. Many teachers have 
taken steps to answer this highly important question through the use 
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of one kind of measuring instrument or another, yet relatively few 
classroom teachers are utilizing to the fullest extent one of the most 
valuable instruments at their command. Supervisors and adminis- 
trators are frequently obliged to admit that they do not have ade- 
quate data on which to base their decisions about the efficiency of 
a certain method of instruction, but must be guided largely by their 
own personal opinions. With the development of adequately valid 
and reliable measuring devices, useful objective information on such 
questions has become available. 

Objective tests, either informal or standardized, are not panaceas 
for all educational inadequacies, but unquestionably the scores they 
afford are useful in the evaluation of instruction. Scores from stand- 
ardized tests are objective and are given meaning by the process of 
standardization. For example, a quality score of 46 assigned to a 
certain handwriting sample is meaningless until it is understood 
that 46 points is standard quality for a fourth-grade child. If such 
a sample were scored as a second-grade product, it might be assigned 
a superior mark. If scored by an eighth-grade teacher, it might 
readily be given an inferior mark. Thus, by the use of standardized 
test scores, specific goals of achievement may be set up, and progress 
may be measured. Test norms themselves provide the basis for the 
more objective grade placement of pupils. Analytic and diagnostic 
tests make possible the discovery of pupils needing special corrective 
instruction. Vague objectives in the course of study may be pointed 
out and methods of instruction may be evaluated through the use of 
educational tests and critical interpretation of their results. 

Experience in the use of tests in many school systems leads to the 
conclusion that quite often there is an intangible psychological effect 
resulting from the administration of a series of tests in a school. 
The experience of the children while taking the test, and the feeling 
on the part of the teacher that his work is being carefully checked, 
are both motivating forces making for better and more effective 
teaching and learning situations. This is, however, only a by-product 
of the use of the test and is no reason for allowing the work to stop 
short of a really constructive supervisory program. 

It should be recognized that educational tests are incapable in and 
of themselves of directly improving instruction in any subject. They 
merely reveal the situation. In a sense a test may be thought of as 
an educational barometer. It reveals the educational atmospheric 
pressure, but does not do anything about it. Perhaps the parallel is 
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closer than at first appears. Just as low barometric readings indicate 
low atmospheric pressure and forecast changing weather conditions 
or storms, low achievement test scores may presage an unsatisfactory 
educational situation. As high or rising barometric records indicate 
fair weather, so high scores on educational te$ts indicate a satisfac- 
tory instructional situation. 

The chief service of tests lies in their power to reveal the strengths 
and weaknesses of individual pupils or of the class as a whole. The 
use of tests must be followed by the next logical step, the develop- 
ment of a constructive supervisory program. It is not enough that 
weaknesses be revealed. They must be corrected by the use of 
properly constructed remedial exercises. 


4 TYPES OF TESTS TO USE 


Significance attached to tests and examinations 


It requires but a casual inspection of educational practices to dis- 
cover the significance that is attached to tests and examinations by 
the school, as well as by the teacher, the pupil, and the parent. 
Pupils spend a great deal of time in preparing for and writing 
examinations. The school spends considerable time and money 
setting up an organization for the preparation and administration of 
examinations. Teachers devote much effort to the preparation, scor- 
ing, and marking of examination papers, while parents in general 
set far too much store by the marks earned by their children on 
School examinations. 

Examinations play an important part in the public relations con- 
tacts of the school. To a certain extent they carry to the parents in 
the community the educational purposes of the school, the aims of 
specific subjects and courses, and the various emphases held im- 
portant by the instructional agents of their school. Examinations in 
part serve as a means of revealing to both parent and pupil the basis 
for a pupil's scholastic rating, his promotions, failures, conditions, 
awards, and preparation for further educational work. 

For the teacher, examinations focus attention on specific objec- 
tives and provide a means of determining his efficiency in achieving 
them. They aid in revealing overemphasis or wrong emphasis in 
teaching method and make possible the experimental evaluation of 
subject-matter organization. The very real value of the properly 
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constructed test or examination as an important teaching instrument 
is reflected in the recent tendency to recognize the special significance 
of low scores as well as high scores on tests and examinations. On 
the assumption that the basic objectives of the course are repre- 
sented adequately in the item content of the test, a high score may 
reflect effective teaching, an accurate memory on the part of the 
pupil, or actual mastery of the essentials of the content. A low 
score, on the other hand, may be of greater significance to the pupil 
and to the instructor, since it identifies specifically the failure of the 
pupil to master concepts which the instructor considered of suffi- 
cient importance to include in the course and in the test. In this way 
the instructional weaknesses and the remedial needs of the individ- 
uals in the class are brought into sharp focus. 


Teacher-made measures of achievement 


The emphasis on the use of the standardized test in most discus- 
sions of measurement problems often leads to the mistaken idea on 
the part of the student that these more formal types of tests are the 
most important measures of achievement. In most subject fields this 
is distinctly not the case. The use of some form of testing procedure 
for instructional purposes probably constitutes nine-tenths of the 
teacher’s measurement activity in the classroom. Accordingly, much 
more attention should be given to the improvement of the teacher’s 
informal measures of achievement. 


Using informal objective tests in the classroom 


The informal objective examination has increased in popularity 
with great rapidity during the last two decades, although it is well 
recognized that there are certain areas of educational accomplish- 
ment in which it does not measure adequately. The successful con- 
struction of objective examinations calls for the application of many 
of the same principles of test construction as are involved in the 
development of standardized tests. 


Uses of standardized educational tests 


Educational tests, because of their definiteness and objectivity, 
reveal to the teacher the status of the achievement of his class. They 
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point out individual pupil differences in capacity and achievement, 
If standardized, they set up specific goals of achievement for the 
teacher. They reveal the results of special types of emphasis, or of 
special methods of instruction. They open to the administrator and 
the teacher hitherto untouched sources of information useful in 
giving the pupil proper educational and vocational guidance. In 
their modern conception, they reveal to the teacher the specific 
weakness of individual pupils so definitely that he is in a position 
to apply effective instructional and corrective methods. Tests them- 
selves have little or no power to bring about changes in pupil achieve- 
ment as a mere result of their use. Their chief service is their power 
to reveal pupil strengths or weaknesses. The correction of weaknesses 
is another aspect of the supervisory problem. 


Using standardized tests in the classroom 


Teachers themselves must assume a larger share of the responsi- 
bility for the use of educational tests in the classroom, and for the 
interpretation and application of the results after the tests have 
been given. Only by so doing does the teacher receive an adequate 
return from the use of tests. If this responsibility is to be wisely 
assumed, the teacher must have an understanding of the possibilities 
and the weaknesses of tests. He must be trained in their use and the 
interpretation of their results. He must be willing to exchange à 
certain amount of personal effort for the information concerning his 
teaching problems that the tests can furnish him. 

Training in the use of tests comes as a result of their use. Oppor- 
tunity for this training may be afforded through the preparation and 
use of informal objective tests as substitutes for the traditional 
examination, or it may be provided by undertaking a study of some 
supervisory problems of importance to the teacher in which stand- 
ardized tests are used. 

Main uses of tests. Three main types of uses of educational tests — 
are noted, each resulting in a different point of view regarding the | 
teacher's responsibility. Tests of a detailed diagnostic type designed 
to give the teacher precise information concerning the abilities and 
limitations of his pupils are instructional in their function. The 
responsibility for the use of such material should be the teacher's. 
Tests designed to be used more particularly for survey or super- 
visory purposes should probably be administered by persons other 
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than the teacher. The use of tests for administrative purposes, such 
as for pupil classification, gradation, or sectioning, may well be the 
joint responsibility of the teacher and the administrator. In other 
words, the function the tests are intended to perform determines 
where the responsibility for their use and interpretation lies. 

Selection of the test to use. The criteria for tests—validity, relia- 
bility, adequacy, objectivity, practicality, administrability, scor- 
ability, economy, comparability, and utility—afford the teacher a 
tangible basis for the selection of the test to use for a certain purpose. 
In a general way, however, the teacher should depend upon the 
advice of persons who have made a special study of the tests, rather 
than attempt to apply these criteria personally. If affirmative 
answers to each of the following questions are available concerning 
a specific test, the teacher may feel reasonably safe in selecting it 
for use. 


т. Does this particular test measure the skills, knowledges, concepts, 
understandings; applications, or appreciations: I wish’ to measure? 

2. How much time does it take to give the test? Is it long enough to 

give a reliable and consistent measure? 

Is it easily and accurately scored? 

Has it been widely used elsewhere? 

5. Does it furnish accurate and extensive norms for comparison and 
interpretation? 

6. Is the interpretation of the scores simple and clear? 

Do the results point the way to a remedial program? 

. Is the test economical in terms of time and money cost per unit of 

reliable information furnished by it? 


ce 


com 


Administration of the test. One of the distinctive features of 
standardized tests is that they must be given under conditions closely 
approximating those under which they were standardized, if the 
results are to be meaningful. Accordingly the teacher should follow 
the directions furnished with such tests. 

The attitude and the personality of the examiner are also im- 
portant in the administration of a test. The whole purpose of the 
test is defeated if an unnatural response is obtained. In the giving 
of tests the greatest care must be exercised to secure the cheerful 
confidence of the pupils. 

Scoring the test papers. Although the use of machine-scored tests 
and test-scoring services is growing rapidly, many users of educa- 
tional tests feel that much of the value arising from their use in the 
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classroom is lost if the teacher himself does not have some first-hand 
contact with the papers. For the majority of the better tests, the 
task of scoring the papers is greatly simplified and objectified by the 
use of answer keys, scoring stencils, and in many cases mechanical 
devices. Whether the tests are hand-scored by teachers or mechani- 
cally scored as a part of a testing service, the answer sheets or the 
detailed records of the pupil's results should be made available to 
the teacher for individual instructional guidance. 


5 INTERPRETATION OF TEST RESULTS 


Summarizing and interpreting the results of testing 


Skill in summarizing and interpreting test results is dependent 
upon the mastery of the following statistical techniques: 


A knowledge of why and how to classify and tabulate data. 

А knowledge of how to find the common measures of central tendency. 

A knowledge of how to express the variability of data. 

A knowledge of how to determine the relationship between two or 

more groups of data. e 

5- A knowledge of how to derive and use norms and derived scores for 
purposes of comparison and interpretation of test results. 

6. A knowledge of how to treat data for simple graphic presentation. 


PwWNH 


In addition to the use the classroom teacher may make of these 
skills in the proper interpretation and utilization of test scores, there 
is the application that may be made of them in the study of current 
educational literature. Reports of progress in education are filled 
with statistical terms and techniques. The teacher can scarcely hope 
to keep abreast of the times in his profession if he is unable to read 
current educational literature understandingly. 


6 PRACTICAL ASPECTS OF CLASSROOM MEASUREMENT 


Diagnostic testing and remedial teaching 


The analysis, identification, and measurement of the abilities 
that underlie and condition educational achievement unquestionably 
constitute the high point of the use of tests in educational practice. 
-Forming the background of practically all possibilities of learning 
is that curiously interwoven maze of traits, tendencies, and predis- 
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positions known as mental ability. Naturally enough, instruments 
designed to sample into this field constitute an important unit of 
the teacher's diagnostic equipment. 

Intelligence. The acceptance of the definition of intelligence as the 
capacity or power of the individual to learn or to adapt himself to 
new situations makes it relatively easy to set up devices for its 
measurement and interpretation. Intelligence tests are incapable 
of securing a direct measure of capacity unaffected by experience and 
training. They measure neither the actual process of learning, nor 
the quality of the learning equipment directly, but they provide the 
basis for inferences about the equipment from the amount of learn- 
ing that has taken place under certain conditions. The value of the 
intelligence test lies in the opportunity it affords for making this 
inference quickly and on a reasongbly objective basis. Thus the 
intelligence test, carefully used and critically interpreted, constitutes 
a most effective and useful instrument for classroom diagnosis. Not 
only do intelligence test scores provide valuable evidence of basic or 
general limitations and superiorities, but the related aptitude and 
group-factor tests offer most helpful hints about the existence of 
more highly specialized abilities or disabilities. In the last analysis, 
predictive tests that render such important service in certain types 
of educational and vocational guidance are specialized tests of 
intelligence. 

Personality. Yn the sense that an individual's personality is re- 
vealed in all his behavior, this aspect of classroom measurement is 
allinclusive. In a somewhat narrower sense, personality has to do 
with such forms of behavior as attitudes, interests, and emotional 
adjustment, all of which are important considerations in the class- 
room. Personality inventories and scales are doubtless still in their 
early stages of development, but they afford evidence of types not 
realized from intelligence or achievement tests which teachers 
should find valuable in the guidance and adjustment of their pupils. 

Achievement in the special subjects. It is now possible to evaluate 
achievement and to diagnose disabilities with practical accuracy in 
the fields of whole numbers, fractions, decimals, percentage, mensura- 
tion, interest and business forms, and in problem-solving in arith- 
metic. The subject lends itself well to analysis and identification of 
specific skills, and thus to diagnosis. In other subjects, such as read- 
ing and language, a similarly exact identification of skills has not 
been accomplished, although some progress has been made in the 
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analysis of factors underlying achievement in broad skill areas in 
both reading and language. Tests capable of furnishing results ac- 
curate for individual diagnosis are now available for such reading 
skills as word meaning, sentence meaning, paragraph comprehension, 
rate of reading, and for certain of the more mechanical language 
skills, such as capitalization, punctuation, and language usage. Spell- 
ing and handwriting, two of the very important mechanical elements 
of written language, have both been analyzed and can be measured 
with reasonable success. The content subjects, such as the social 
sciences and the more exact sciences, because of vagueness in the 
Statements of their aims and purposes, are extremely difficult to 
evaluate objectively. Then, too, mastery of the tool subjects, such as 
reading, language, and computational skills, enters into accomplish- 
ment in these fields to a very large degree. 

Changes in educational emphasis from the vocational and practical 
to the cultural are creating an increased interest in measurement in 
the fine arts. The modern emphasis upon preventive measures in 
health education and adaptation of physical education to individual 
needs is motivation for many of the modern evaluative techniques 
of a non-test type in this area. 

General educational achievement. While the emphasis throughout 
this volume is somewhat more on the measurement of the specific 
than the general aspects of school accomplishments, there is a 
recognizable need for the latter type of measurement. For general 
survey purposes, for evaluation of curricular content, and for later 
individual detailed diagnosis, such general achievement tests are 
valuable. 


Measurement and the total child 


There are certain aspects of ability, accomplishment, skill, apti- Ф. 


tude, character, and personality that unquestionably lend themselves’ 
to reasonably objective measurement. The emphasis placed on these 
measurable qualities frequently gives the impression that they 
represent the major elements in the total understanding of the child. 
Such is far from the case, however, for many of the intangibles of 
the child's personality are almost certainly of greater importance, 
although in many cases they are practically impossible to measure 
objectively. This merely means that the teacher must be made keenly 
aware of the fact that something lies beyond objective measurement. 
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He must see that appraisals of the child's total personality are basic 
to effective classroom teaching. He must recognize that many (prob- 
ably most) of these vital appraisals must be made on the basis of 
keen observation and sympathetic analysis of his pupils. Even if 
the teacher were gifted with unusual observational and analytical 
power, superior native capacity, and natural sympathy, even if he 
were a four-year college or normal school graduate with graduate 
degrees in education, psychology, sociology, psychiatry, and medi- 
cine, he could not hope to comprehend more than a few problems of 
the child's personality. The important point here is that, while it is 
impossible to know all, it is not impossible for the teacher to be 
cognizant of and sensitive to these problems. 


After testing, what? 


This question is in the back of the mind of every classroom teacher 

and every supervisor who has used standardized tests. Much of the 
early use of tests was futile, since such broad and vague phases of 
educational achievement were tested that, even though reliable 
results were obtained, nothing specific could be done about the situa- 
tion. Furthermore, a great deal of the early use of tests in the class- 
room was a matter of satisfying curiosity. Teachers have a right to 
expect that something tangible will be given them in return for 
pupil time spent in testing. Pupils themselves may even have some 
rights in the matter. One way to insure this return is for the teachers 
themselves to take an active part in the program. A type of training, 
an attitude toward their profession, a clearer insight into the diffi- 
culties faced by their pupils, are thereby gained which may not come 
to them in any other way. 
. The results of supervisory tests given periodically for the purpose 
' of checking the efficiency of pupil learning should be revealed to the 
'teacher and the pupils in terms of specific suggestions for the further 
improvement of the situation. Instructional and diagnostic tests used 
by teachers in the classroom should furnish such specific information 
concerning the abilities and limitations of their pupils that a program 
of preventive and corrective instruction can be begun at once. 
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7 ORGANIZATION OF THIS BOOK 


Purpose of this book 


The purpose of this book is twofold: (1) to interest the student 
of education in the possibilities of measurement and evaluation in 
education, and (2) to stimulate the teacher and supervisor to make 
more effective use of tests and other evaluative devices as integral 
parts of enlightened teaching practice. To accomplish this twofold 
purpose the reader is gradually introduced to the meaning and pos- 
sibilities of measurement through the examination of some of the 
well-known current classroom practices. Chapter 2 briefly outlines 
certain historically important steps in the development of educa- 
tional and mental tests. Tests are classified into their major types in 
Chapter 3, and a brief description is given of each type. 

Chapter 4, which discusses at some length the characteristics or 
criteria of a good examination, is exceedingly important. It can most 
advantageously be studied after a reasonable understanding of cer- 
tain statistical and correlational techniques has been established. A 
comprehensive understanding of the three most important criteria 
of a good examination depends upon the ability to interpret correla- 
tion coefficients. It is believed that a careful study of selected sec- 
tions of Chapters 12, 13, and 14 will sufficiently acquaint the student 
with the meaning and uses of correlation for the immediate purposes 
of the discussion in Chapter 4. 

Chapters 5 to 1r present the methods of constructing and the 
values and uses to the teacher of the major types of tests and evalua- 
tive techniques—standardized tests in Chapter 5, teacher-made essay 
and informal objective tests in Chapters 6 and 7, performance tests 
in Chapter 8, and evaluative techniques in Chapter 9. Intelligence 
and aptitude tests and personality instruments and techniques are 
discussed in Chapters то and тт. 

Those especially interested in following to its logical conclusion 
the use of tests in the classroom will wish to study the remaining 
chapters with particular care, for here are presented the possibilities 
and the practical methods of using test results for analyzing and 
diagnosing the learning difficulties of pupils and the inauguration of 
preventive and remedial instruction in important school subjects. 
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Study aids 


The student who is genuinely interested in improving his under- 
standing of many of the points presented in this volume will find 
much profit in the careful preparation of the discussion exercises at 
the end of each chapter. Those who are still more deeply interested 
in, and wish to pursue further, the problems of measurement in edu- 
cation will find the selected references at the close of each chapter 
of particular value. Because the field of educational measurements 
is so rich and the essential material is so extensive, it is impossible 
to compress into the pages allotted to this book even a good deal of 
the material that is considered by many to be fundamental. 

"Teachers, themselves expert in the technique of learning, know 
that passive reading, while yielding information and appreciation, 
does not develop easy, dependable skill in doing the thing described. 
То provide the opportunity for the student and the teacher actually 
to secure a more complete mastery of certain of these techniques, a 
Work-Book in Educational Measurements and. Evaluation has been 
prepared as a companion volume for this treatment. In this Work. 
Book the reader solves practical problems of the type that the class- 
room teacher and supervisor face. Mastery of this text and a careful 
working of the projects in the Work-Book will practically guarantee 
to the reader an actual concrete experience with the major problems 
of a dynamic testing program calculated to be of the greatest service 
in the improvement of typical classroom situations. 


Topics for Discussion 


т. What specific evidence is there that the idea of measurement in 
education is not entirely new but has been in the minds of teachers 
for many years? 

2. What are some of the chief differences in the attitudes of teachers 
toward measurement and evaluation thirty years ago and today? 

3. How far is the classroom teacher responsible for the understanding 
and use of educational tests? 

4. Why is it a good thing for all educational tests to be subjected to 

sharp criticism by teachers? 

Indicate several of the major characteristics of educational tests. 

In what specific ways are informal objective tests and standardized 

tests alike and in what ways are they unlike? 


Quo 
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7. Specify several things that educational tests, when properly used, 
do for the classroom teacher and his pupils. 

8. Show how tests and examinations play an important part in the 
public relations contacts of the school. 
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Development of Educational 
and Mental Measurement 


THE DEVELOPMENT of educational and mental measurement from the 
time of-the-earliest historical records to the present is traced briefly 
in this chapter for the following areas and periods: 


A. Measurement to 1800. 

в. Educational testing from 1800 to 1900. 

c. Educational measurement and evaluation from 1900 to the 
present. 

Intelligence testing from 1800 to 1900. 

Intelligence measurement from 1900 to the present. 
Personality evaluation from т8оо to the present. 

Present status of educational and mental measurement and 
evaluation. 


pmp 


Measurement of human behavior with primary reference to the 
capacities and educational attainments of school children can well 
be divided roughly into three periods. During the first period, from 
the beginning of historical records down to about the nineteenth 
century A.D., educational measurements were naturally quite crude. 
Although the fact that individuals differ widely in their capacities 
and abilities has been recognized for several thousand years and 
educational measurement made formal entrance to the schools as 
early as medieval times, relatively little progress in educational test- 
ing was made until the nineteenth century. During the second period, 
embracing approximately the nineteenth century, educational meas- 
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urement began to assimilate from various sources the ideas and the 
scientific and statistical techniques which were later to result in the 
modern objective testing movement. The brief third period, dating 
from about тдоо to the present, has been characterized by tremen- 
dous advances in statistical techniques, in the measurement and 
evaluation of achievement, intelligence, and personality, and in the 
classroom use of test results. 


] MEASUREMENT TO 1800 


Early oral examinations 


The first evidences of the oral examination are found in ancient 
literature. The story is told in the Old Testament (Judges 12:5-7) of 
the test the Gileadites gave to the enemy Ephraimites who wished 
to cross the Jordan. When asked to pronounce the word “Shibboleth,” 
the Ephraimites could answer only with “Sibboleth,” whereas people 
of the friendly tribes could respond with the correct pronunciation. 
Forty-two thousand Ephraimites were killed because they failed to 
pass this objective test. Socrates, in а method he made famous, 
subjected his pupils to exhaustive and searching questioning. Oral 
quizzing, Socratic or otherwise, has undoubtedly been a part of 
classroom procedure from the beginnings of teaching activity—in 
fact, there have been and still are times when, for certain teachers, 
it constitutes practically the whole of the teaching act. 


Early written examinations 


Written tests are probably of more recent origin than oral quizzes, 
but even they date back many centuries. As early as 2200 B.c., China 
had an elaborate national system of examinations for the purpose 
of selecting her public officials, and these examinations have been 
known down through the ages for their unusual severity. Confined in 
isolated cells for hours at a time, candidates were compelled to write 
lengthy papers or treatises on assigned topics.? 


* Norma V. Scheidemann, “The Earliest Recorded Objective Test.” School and 
Society, 20:702; June 1, 1929. 

? W. A. P. Martin, The Chinese: Their Education, Philosophy, and Letters. 
darper and Brothers, New York, 1881. p. 45-49. 
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Recognition of individual differences 


Individual differences among people have long been recognized. 
Plato, nearly four centuries в.с., divided his ideal society into the 
three classes of workers, protectors, and rulers. He believed that per- 
sons suited to each class should receive education for the fullest de- 
velopment of their personalities.? Quintillian, shortly after the start 
of the Christian era, wrote that masters should observe differences in 
ability and inclinations of persons they instruct, for the “forms of 
mind are not less varied than those of bodies." * 


Classification of personality types 


Impressionistic methods of judging personality and of analyzing 
character have doubtless been in vogue for many centuries. They are 
based in the main on physiognomy, body build or glandular makeup, 
and divination. Representative are phrenology, astrology, palmistry, 
and graphology.* 


First educational tests 


The first tests used for the measurement of the results or outcomes 
of education were probably not unlike certain of the performance 
tests of today, at least to the extent that they measured physical 
performance and that they were not paper-and-pencil tests. 

Among various primitive tribes, in which the young men were 
taught to hunt, fish, and fight, the initiation ceremonies prerequisite 
to their admission to the ranks of adult males tested knowledge of 
tribal customs, endurance, bravery, and other skills and abilities 
thought necessary for tribal protection.* 

The ancient Spartans, whose educational curricula for their youth 
stressed physical development and stoicism, conducted examinations 
as early as 500 в.с. in which the young men underwent painful or- 


3 Edgar W. Knight, Twenty Centuries of Education. Ginn and Co., Boston, 1940. 


p. 62. 
4 William Boyd, The History of Western Education. A. and C. Black, Ltd., Lon- 


don, 1921. p. 76. 

5 Henry E. Garrett, Great Experiments in Psychology, Third edition. Appleton- 
Century-Crofts, Inc., New York, 1951. p. 175-81. 

6 Charles Russell, Standard Tests. Ginn and Co., Boston, 1930. p. 14-15. 
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deals.' In ancient Athens, the stress upon athletics and aesthetic 
development led to evaluation by means of games and contests and 
of reading, writing, and singing ability.* 


First tests in the school 


In medieval times, the oral examination was used in universities. 
The University of Bologna by A.D. 1219 and the University of Paris 
before the close of the thirteenth century required degree candidates 
to defend their theses orally. However, the written educational exam- 
ination probably made its first appearance for educational use at 
Cambridge, England, in 1702.? 


2 EDUCATIONAL TESTING FROM 1800 TO 1900 


Early educational tests in America 


According to available records,!? the first examinations of note in 
this country were those of Boston in 1845. Prior to that date the 
school committee had orally examined all Boston pupils, or at least 
those in the highest class in each school, annually. As the pupils in- 
creased in numbers, this task became onerous and eventually received 
only perfunctory attention. Finally, the sub-committee appointed to 
survey the grammar departments of the schools in 1845 decided to 
use written examinations in lieu of the time-consuming oral ex- 
aminations. These subject examinations, in the fields of arithmetic, 
astronomy, geography, grammar, history, and natural philosophy, 
were used to rank the schools in order of merit. 

This Boston examination project is truly a highlight in the history 
of education in the United States. It made a great impression on 
Horace Mann, who at that time was Secretary of the Massachusetts 
Board of Education.’ As editor of the Common School Journal, he 
published extracts from the report and made many noteworthy com- 


7 Ibid. p. 16. 

5 Knight, op. cit. p. 52-53. 

9 Albert R. Lang, Modern Methods in Written Examinations. Houghton Mifflin 
Co., Boston, 1930. p. 2-3. С 

10 Otis W. Caldwell and Stuart A. Courtis, Then and Now in Educatiqn, 1845- 
1923. World. Book Co., Yonkers, ЇЧ. Ү., 1923. Chapters 1, 3. 

11 Since Horace Mann doubtless exerted considerable influence on the sub- 
committee, the examinations were probably reflections of his ideas. 
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ments ?? on the subject of examinations. He concluded that the new 
written examination was so superior to the old oral quiz that no 
school committee would ever lapse into the former inadequate and 
uncertain practice. The reasons advanced by Horace Mann in sup- 
port of the written examination were as follows: '? 


It is impartial. 

It is just to the pupils. 

It is more thorough than older forms of examination. 

It prevents the “officious interference" of the teacher. 

It *determines, beyond appeal or gainsaying, whether the pupils 
have been faithfully and competently taught." 

It takes away “all possibility of favoritism.” 

It makes the information obtained available to all. 

It enables all to appraise the ease or difficulty of the questions. 


ч ROO юн 


ES 


оо -1 


Although these ideas were apparently those represented by modern 
tests, the instruments were inadequate. Tt is significant to note also 
that in successive issues of the Common School Journal Mann sug- 
gested most of the elements in examinations that are found in the 
modern measurement and evaluation movement. 


Early objective tests 


To Rev. George Fisher, an English schoolmaster, goes the credit 
for devising and using what were probably the first objective meas- 
ures of achievement. His “scale books," used in the Greenwich Hos- 
pital School as early as 1864, provided means for evaluating accom- 
plishments in handwriting, spelling, mathematics, navigation, Scrip- 
ture knowledge, grammar and composition, French, general history, 
drawing, and practical science. In such subjects as handwriting and 
drawing, where qualitative rather than quantitative evaluation was 
the custom, specimens of pupil work were compared with "standard 
specimens" to determine numerical ratings. The numerical values 
for spelling and other subjects to which quantitative measures of 
achievement were commonly applied depended upon errors in per- 
formance.** 


12 Horace Mann, “Boston Grammar and Writing Schools.” Common School 
Journal, Vol. VII, No. 19; October 1, 1845. Also reported in: Caldwell and Courtis, 
op. cit. p. 237-72. ‘ 

13 Caldwell and Courtis, op. cit. p. 37. 

14 E. В. Chadwick, “Statistics of Educational Results.” The Museum, A Quarterly 


Magazine of Education, Literature, and Science, 3:480-84; January 1864. 
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Although Fisher's *scale books" included the germ of many of the 
ideas that are incorporated in our present-day educational scales, 
his work produced no lasting results because he lived too far in ad- 
vance of the thought and educational practice of his day. 


First objective tests in America 


In America, the real inventor of the comparative test was Dr. J. 
M. Rice, who, in 1894,15 hit upon the idea he developed so effectively 
that it became the foundation of objective measurement in educa- 
tion. Rice, having administered a list of spelling words to pupils in 
many school systems and analyzed the results, confounded the edu- 
cators at the 1897 session of the Department of Superintendence of 
the National Education Association with the declaration that pupils 
who had studied spelling thirty minutes a day for eight years were 
not better spellers than children who had studied the subject fifteen 
minutes a day for eight years. Rice was attacked and reviled for this 
“heresy,” and some educators even attacked the use of a measure of 
how well pupils could spell for evaluating the efficiency of spelling 
instruction. They contended that spelling was taught to develop the 
pupils’ minds and not to teach them to spell. It was more than ten 
years later that Rice’s pioneering resulted in significant attention to 
the objective method in educational testing.*¢ 


3 EDUCATIONAL MEASUREMENT AND EVALUATION FROM 1900 
TO THE PRESENT 


First book on educational measurement 


Thorndike brought out the first book dealing primarily with men- 
tal and educational measurements in 1904," and both through this 
book and his later influence on his students became more than any 
other person responsible for the early development and populariza- 
tion of standardized educational tests, 


15 Leonard P. Ayres, *History and Present Status of Educational Measurements." 
The Measurement of Educational Products. Seventeenth Yearbook of the National 
Society for the Study of Education, Part II. Public School Publishing Co., Bloom- 
ington, Ill., 1918. p. тт. 

16 Ibid, p. 12. 

17 Edward L. Thorndike, An Introduction to the Theory of Mental and Social 
Measurements. Teachers College, Columbia University, New York, 1904. 
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First standardized achievement tests 


Stone, a student of Thorndike's, published his arithmetic reason- 
ing test, the first standardized instrument to make its appearance, 
in 1908.18 Thorndike in 1909 published his Scale for Handwriting of 
Children—the first standardized achievement scale)? During the 
period 1909 to 1915, a series of arithmetic tests and five scales for 
measuring abilities in English composition, spelling, drawing, and 
handwriting were published.*° It is interesting to note that only two 
of these pioneer instruments were tests, while the remaining five were 
scales. 

Educators at first opposed the standardized test and derided the 
testers. However, the spread of standardized testing continued, under 
the stimulation of at least three early developments: 

(т) The numerous important studies of the accuracy of school 
marks, revealing the fact that they are highly subjective and inac- 
curate, demonstrated the need for instruments that would yield more 
accurate measures of achievement. 

(2) The surveys of certain of the larger school systems both stim- 
ulated the construction and use of tests and were influenced by the 
development of more objective devices for measuring the abilities 
of pupils. 

(3) The development of educational measurements in research 
bureaus organized in many of the larger school systems, universities, 
and state departments of public instruction was influential in popu- 
larizing the use of educational tests. Although the pioneer and most 
of the early standardized tests were for use in the elementary school, 
it was not many years until the high school and even the college 
were well provided with such instruments. 


Development of informal objective examinations 


The idea of the informal objective examination, referred to during 
its early days rather loosely as the *New-Type Test" and the *Ob- 


18 Cliff W. Stone, Arithmetical Abilities and Some Factors Determining Them. 
Contributions to Education, No. 19. Teachers College, Columbia University, New 


York, 1908. 
19 Edward L. Thorndike, “Handwriting.” Teachers College Record, 11:83-175; 


March 1910. r ' 
20 C, W. Odell, Educational Measurements in High School. Century Co., New 


York, 1930. p. 34-35. 
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jective Test," apparently was first publicly expressed by McCall,?' 
whose article in 1920 first suggested that teachers do not need to 
depend solely upon standardized tests but that they can construct 
their own objective tests for classroom use. The pioneer book dealing 
almost entirely with this testing adaptation was published in 1924.°° 
The informal objective test has since come into such wide use that 
a survey in 1936 of testing practices among 1600 high-school teachers 
widely distributed throughout the country showed that 74 per cent 
used the informal objective and an additional ro per cent used a com- 
bination of the informal objective and essay examinations.?? 


Later development of standardized achievement tests 


The history of achievement measurement since the late twenties 
has been characterized mainly by increasing recognition of the fact 
that test results offer only one, although the major one, of the types 
of acceptable evidence on pupil achievement. This tendency toward 
evaluation, which is broader in scope than testing, has been accom- 
panied by a strong trend toward more scientific use of measurement 
tools. 

Although the contributions of Tyler have been significant in both 
the standardized testing and the informal objective testing move- 
ments, it is probably in the latter field that his influence was first felt. 
He outlined steps of procedure for test construction and validation 
which clearly pointed out the essential dependence of a program of 
achievement testing on the objectives of instruction and the recog- 
nition of forms of pupil behavior indicating attainment of the desired 
instructional outcomes.** Perhaps he more than any other single 
test specialist was responsible for the extension of achievement test- 
ing to the more intangible outcomes of instruction, for his contribu- 
tions nearly twenty years ago doubtless did much to bring into being 


?1 William A. McCall, *A New Kind of School Examination." Journal of Edu- 
cational Research, 1:33-46; January 1920. 

22 G, M. Ruch, The Improvement of the Written Examination. Scott, Foresman 
and Co., Chicago, 1924. 

23 J. Murray Lee and David Segel, Testing Practices of High-School Teachers. 
U. S. Office of Education Bulletin, 1936, No. 9. U. S. Government Printing Office, 
Washington, D. C., 1936. p. 6. 

?* Ralph W. Tyler, *A Generalized Technique íor Constructing Achievement 
Tests." Educational Research Bulletin, 8:199-208; April 15, 1931. 
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the broad modern conception of evaluation to replace the earlier 
and narrower concept of testing.*® 


Development of evaluation instruments 


The Eight-Year Study of member schools of the Progressive Edu- 
cation Association, completed some twelve years ago, affected meas- 
urement and evaluation practices markedly. The evaluation staff, 
working under Tyler’s direction, developed a series of instruments 
for measuring such outcomes as logical reasoning, ability to apply 
principles in the sciences, ability to interpret data, and ability to 
interpret literature.” These and other instruments since made avail- 
able, including those developed in the Cooperative Study of General 
Education, are designed to measure functional and relatively in- 
tangible outcomes in areas of behavior rather than more formal and 
tangible instructional outcomes in separate subjects or areas of the 
curriculum. A related trend is evidenced in the batteries of tests de- 
veloped during the past ten or so years for the measurement of 
general educational development. 


Development of evaluative tools and techniques 


Paralleling the development of paper-and-pencil tests has been 
the development of other evaluative tools and of techniques for meas- 
uring procedures involved in and products resulting from certain 
types of skill performances and various other aspects of behavior of 
the whole child. Prominent among such evaluative tools are the check 
list, the rating scale, the questionnaire, the pupil profile, the class 
record sheet, and the cumulative record. Evaluative techniques are 
represented by the anecdotal report, the interview, the case study, 
the sociogram, and observational analyses of group dynamics. 


25 Ralph W. Tyler, Constructing Achievement Tests. Ohio State University, 


Columbus, Ohio, 1934. 
26 Eugene R. Smith, Ralph W. Tyler, and others, Appraising and Recording 


Student Progress. Harper and Brothers, New York, 1942. 


28 THE SECONDARY SCHOOL 


4 INTELLIGENCE TESTING FROM 1800 TO 1900 


Scientific recognition of individual differences 


It was not, apparently, until 1796, that individual differences in 
mental abilities were first brought under not the microscope, but, 
literally, the telescope. It was in that year at the Greenwich Astro- 
nomical Observatory in England that one of the observers who re- 
corded the instant of time at which stars crossed the lines on telescope 
. lenses was discharged because his observations consistently differed 
slightly from those of his colleagues. In 1816, however, it was dis- 
covered by an astronomer who read an account of this incident that 
an error of observation, called the “personal equation,” characterized 
the work of all observers and that the amount of error varied from 
person to person and also in the same person from time to time.*? 


Scientific study of individual differences 


Galton, with the publication of his Hereditary Genius in 1869, 
brought the scientific study of individual differences into focus, de- 
veloped it further by instituting measurement of various human 
physical traits and motor abilities, and even investigated mental 
ability by methods which many years later became highly fruitful. 


Foundations of statistical method 


Galton's most important contribution to educational measure- 
ments was not in the field of individual differences, however, but in 
the derivation of statistical methods. Here, in devising a system of 
"standard scores" and in developing graphically the idea for an 
objective measure of relationship, the correlation coefficient, he fur- 
nished tools essential not alone to the development of educational 
and mental testing but also to scientific method in education. Pearson 
later formulated the method now most commonly used for calculat- 
ing the correlation coefficient.?° 


27 Anne Anastasi and John P. Foley, Jr., Differential Psychology: Individual and 
Group Differences in Behavior, Revised edition. Macmillan Co., New York, 1949. 
p. 7-8. 

28 Joseph Peterson, Early Conceptions and Tests of Intelligence. World Book 
Co., Yonkers, N. Y., 1925. p. 73-75. 

29 Garrett, op. cit. p. 269-72. 
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Early attempts to measure intelligence 


Dr. E. S. Chaille, an American physician, is credited as early as 
1887 with the development of standards and simple tests for judging 
the mental levels of children to the age of three and with having 
implied, although not definitely used, the concept of mental age °° 
as an index of mental maturity. 

Cattell apparently first used the term “mental test" ?* in 1890, 
almost at the beginning of the period during which scientific method 
was first applied to the measurement of mental ability. Attempts 
during the last decade of the nineteenth century by Cattell and others 
to measure intelligence by means of physical characteristics, sensory 
acuity, and motor skills tests gave, for the most part, negative 
results.?? 

During the same period, Binet and his colleagues were experi- 
menting in France with tests of a somewhat similar but less specific 
type. In 1895, Binet and Henri described ten types of tests which, 
differing from American tests mainly in the much greater complexity 
of behavior they would measure, they thought were likely to discrimi- 
nate between levels:of mental ability.** 


5 INTELLIGENCE MEASUREMENT FROM 1900 TO THE PRESENT 


First individual intelligence tests 


Binet and Simon brought out the first intelligence scale in 1905, 
devising it primarily for the purpose of selecting mentally retarded 
pupils who required special instruction. This pioneer individual in- 
telligence scale utilized the basic idea of interpreting the relative 
intelligence of different children at any given chronological age by 
the number of tests of varied types and increasing levels of difficulty 
they could pass. These characteristics were all re-embodied in the 
1908 and 1911 revisions of the Binet-Simon Scale and also are basic 
to most individual intelligence scales even today. The 1908 Revision 


30 Florence L. Goodenough, *An Early Intelligence Test." Child Development, 
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introduced the fundamentally important concept of mental age 
(MA) and provided means for obtaining it.** 


Individual intelligence tests in America 


Goddard, Kuhlmann, and Terman all adapted the Binet-Simon 
tests to use with American children during the period from ror: to 
1916. Terman and his collaborators made the Stanford Revision of 
the Binet Scale available in 1916, and in 1937 followed it with a 
second and more complete revision. These revisions make use of the 
intelligence quotient (/Q), based on the relationship between a child's 
mental age and his chronological age.** 


Group intelligence tests 


Although various psychologists had been working on a group in- 
telligence test, and Otis was near the point of issuing such a test 
around 1917, the Army Alpha test, used for measuring and placing 
American army recruits and draftees during World War I, was the 
first group intelligence test to be published. The Army Alpha test, 
widely used for testing men who could read and understand English, 
was accompanied by Army Beta, a non-language test for use with 
illiterates and men who, although perhaps literate in a foreign lan- 
guage, could not read English.*? These tests were widely used by 
educators after the close of the war. 

Group intelligence tests began making their appearance almost 
immediately following the end of World War I, and the period from 
1918 to the middle twenties was marked both by the publication of 
many such tests and by an upsurge of general interest in intelligence 
testing. Although the testing techniques have been refined consid- 
erably since then, the past quarter century has brought no outstand- 
ing changes in the methods of measuring general intelligence. The 
Army General Classification Test and the Army Individual Test of 
Mental Ability served functions in World War II closely similar to 
those of Army Alpha and Army Beta in World War I. Several other 
armed service branches developed comparable instruments for use 
in their programs of selection and classification. 


34 Freeman, op. cit. p. 86-88. 
35 Ibid. p. 101. 
36 Ibid. p. 113-35. 
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Aptitude or specific intelligence tests 


The measurement of aptitudes, or those potentialities for success 
in an area of performance that exist prior to direct acquaintance 
with that area, has been tied up with intelligence testing both fore 
and aft. Early attempts to measure general intelligence were by 
means of tests of many specific traits and aptitudes, but that ap- 
proach was dropped when Binet showed that tests of more complex 
forms of behavior were superior. It was soon apparent, however, 
that general intelligence tests were not highly predictive of certain 
types of performance, especially in the trades and industries. 

Münsterberg's aptitude tests for telephone girls and streetcar 
motormen in 1913 were followed by tests of mechanical aptitude, 
musical aptitude, art aptitude, clerical aptitude, and aptitude for 
various subjects of the high-school and college curricula prior to 
1930.°* Spearman's splitting of total mental ability into a general 
factor and many specific factors ** had its influence on this move- 
ment, and accounted for the fact that aptitude tests are frequently 
called specific intelligence tests. 


Factored intelligence tests 


With the development of factor analysis methods, largely within 
the past two decades, certain group factors of intelligence thought. 
to differ from the specific factors or aptitudes and also from general 
intelligence have emerged.*® These were first recognized in measure- 
ment practice by the introduction of separate linguistic and quanti- 
tative, or verbal and non-verbal, scores into certain tests of mental 
ability that continued to furnish a general measure of intelligence. 
In addition, several batteries for the measurement of primary mental 
abilities, differential aptitudes, and general aptitudes, each designed 
to distinguish several group factors of intelligence, have made their 
appearance during the last ten or twelve years. 


87 Goodwin Watson, “The Specific Techniques of Investigation: Testing Intelli- 
gence, Aptitudes, and Personality.” The Scientific Movement in Education. Thirty- 
Seventh Yearbook of the National Society for the Study of Education, Part II, 
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6 PERSONALITY EVALUATION FROM 1800 TO THE PRESENT 


Antecedents of modern personality tests 


Personality testing had its antecedents in the work of Kraepelin 
and Sommer on free association tests during the last decade of the 
nineteenth century. Although free association tests have persisted to 
the present day, the questionnaire and rating scale methods used by 
Galton and others at still earlier dates became the dominant early 
methods of personality measurement in America.*? 


Modern personality inventories 


Woodworth devised a Personal Data Sheet, in reality an inventory 
of neurotic tendencies and emotional maladjustment, for use with 
American soldiers during World War I. This was probably the out- 
standing early contribution in this field.** A significant number of 
these structured personality inventories have been developed during 
the past thirty years for the measurement of adjustment, attitudes, 
and vocational interests. 


Projective methods 


Jung in 1905 published a free association test designed to reveal 
emotional complexes.*? Hartshorne, May, and their colleagues made 
exhaustive studies of conduct in largely unstructured or free response 
situations in the Character Education Inquiry. Although the 
Rorschach, the first modern projective test, was introduced in 1921, 
it was not until some fifteen years ago that projective techniques 
employing such unstructured situations as inkblots and pictures came 
into wide use in the study of personality.** An outgrowth of psy- 
chiatry and academic psychology, these unstructured methods of 


^0 Anastasi and Foley, of. cit. p. 22. 

41 Watson, of. cit. p. 368. 

42 Ibid. p. 368. 

13 Hugh Hartshorne, Mark A. May, and others, Studies in the Nature of Char- 
acter, Volumes I-III. Macmillan Co., New York, 1928, 1929, 1930. 

44 Watson, op. cit. p. 369. 
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studying personality came to be termed projective methods only 
in 1939.45 


7 PRESENT STATUS OF EDUCATIONAL AND MENTAL 
MEASUREMENT AND EVALUATION 


Although measurement and evaluation, whether of achievement, 
intelligence, or personality, are still in a developmental stage, 
Monroe 4% stated that the movement, beyond its infancy in 1920, 
had reached early adulthood by 1945. He also commented that the 
fifty or more types of objective test items or item groups represented 
a marked extension and improvement in techniques of measurement 
and evaluation. 

Reavis ** reported Educational Records Bureau estimates that in 
1944 approximately 60 million tests were administered to around 20 
million persons in the United States, Many of these tests were used 
by the armed services and civil service, but a significant proportion 
were used in schools, colleges, and industry. Woodruff and Pritchard #8 
indicated that in 1948 their test files included 1080 tests representing 
the output of 74 test publishers. 

The measurement and evaluation aspects of the school program 
have markedly increased in scope and significance during the past 
score or so of years. Measurement and evaluation techniques now not 
only reflect developments in educational philosophy and psychology 
but also increasingly are furnishing evidence that aids school of- 
ficials in charting the future course. Pupil guidance may be con- 
sidered the central theme, for directly or indirectly all educational 
planning and procedures are designed to effect improvements in the 
education and in the guidance of the individual school child. The 
classroom teacher remains the key person in pupil measurement and 
evaluation. Measurement and evaluation specialists, subject-matter 


45 Helen Sargent, “Projective Methods: Their Origins, Theory, and Application 
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'specialists, and specialists in areas of child behavior increasingly co- 
operate with and depend upon the classroom teacher in the develop- 
ment of new instruments, tools, and techniques for pupil appraisal. 


Topics for Discussion 


16. 


17. 


What were some of the ancient forerunners of educational tests? 
Show how educational testing had its origins centuries before stand- 
ardized and informal objective tests were developed. 

Discuss the early recognition of individual differences in mental 
ability and personality. 

List and evaluate the most important ideas concerning examina- 
tions expressed by Horace Mann. 

Discuss the “scale books" developed by Rev. George Fisher and 
compare them with modern educational scales. 

What was the significance for objective measurement and for edu- 
cational research of the contributions made by Dr. J. M. Rice? 
What three important educational developments of the first two 
decades of the present century indirectly stimulated the growth of 
interest in educational measurements? 

Who were the pioneers in the development of standardized educa- 
tional tests? What was their influence on the measurement move- 
ment? 

Indicate the part played by informal objective examinations in the 
development of educational measurement. 


. What influences contributed to the rise of evaluation instruments, 


tools, and techniques? 

By what method did workers in the field of mental ability first 
seek to measure intelligence? How successful were their attempts? 
Discuss the contributions of Binet and Simon to the intelligence- 
testing movement. 

Briefly discuss the development of group intelligence testing from 
World War I to the present. 


. What types of abilities are measured by general intelligence tests? 


By specific intelligence or aptitude tests? By tests of group factors 
of intelligence? 

Discuss the early attempts to measure personality objectively and 
the more recent structured personality inventories and projective 
methods. 

Comment upon the status of educational and mental measurement 
and evaluation today. 

Discuss the significance of measurement and evaluation for edu- 
cational planning and practices 
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Educational and Mental Measuring 
Instruments and Techniques 


THIS CHAPTER presents a classification of the tests and other tools and 
techniques used in educational and mental measurement: 


A. General classification of educational and mental tests. 
Basic types of educational tests. 

Evaluative tools and techniques of an educational nature. 
General intelligence tests. 

Specific intelligence tests. 

. Group-factor tests of intelligence. 

Performance tests of intelligence. 

н. Inventories and techniques for personality evaluation. 


PAH Sop 


The measurement and evaluation of the whole child involve the 
use of many tests and other devices that cannot properly be called 
tests. Most of the remaining chapters of this volume are devoted to 
treatments of the various types of tests, non-test tools, and techniques 
characterized only briefly here. 


] GENERAL CLASSIFICATION OF TESTS 


Educational and mental tests 


Modern tests are so varied in type and purpose that it is extremely 
difficult to classify them clearly. Tests can be classified in terms of 
their forms, their origins, their functions, and their content. In this 
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chapter tests are first classified broadly by function—educational, in- 
telligence, and personality—and within major divisions are classified 
by whatever pattern seems most likely to familiarize the student 
with their major characteristics. 

Educational tests have as their primary function the measurement 
of the results or effects of instruction and learning. On the other 
hand, intelligence tests, or psychological examinations, have as their 
purpose the measurement of pupil intelligence or mental ability in a 
large degree without reference to what the pupil has learned either 
in or out of school. Personality tests attempt to measure such in- 
tangible aspects of behavior as attitudes, interests, and emotional 
adjustment. 

There is not complete uniformity of terminology with respect to 
educational tests and mental tests. Although the former have a 
commonly-accepted meaning, the latter are thought variously to in- 
clude educational, intelligence, and personality tests, to include 
intelligence and personality but not educational tests, and even to 
mean the same thing as intelligence tests. Modern practice seems 
often to make use of the three-way classification—educational tests, 
intelligence tests, and personality inventories. As this distinction 
appears to be most satisfactory for the purposes of this book, it will 
be followed throughout this volume. 


Tests, scales, and scaled tests 


Objective tests can be classified in a manner that cuts across 
the three fields of educational, mental, and personality testing—into 
tests and scales, and also scaled tests. This distinction is of some 
value, but at times it results in confusion since certain types of 
objective tests resemble scales or contain certain features of scales 
as an essential part of their construction. 

In general terms, a test is an instrument designed for the measure- 
ment and evaluation of any knowledge, quality, or ability. It may 
measure degree or amount of achievement, mental abilities, or even 
such intangible qualities as personality and character traits. It may 
be made up of items of similar difficulty, items arranged in increasing 
order of difficulty, or items arranged in such other ways as by types 
of items or order of occurrence of topics in a course. Ordinarily the 
test is used in the classroom by the pupils. 

A scale is a series of objective samples or products of different 
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difficulty or quality that have been arranged in a definite order or 
position, usually in ascending order of difficulty or merit. The sam- 
ples are equally spaced on a scale of value, of difficulty, or of quality. 
Usually the scale is employed by the teacher as an aid in the evalu- 
ation of the particular product. 

A scaled test combines certain properties of the test and the scale. 
If the items in a test are arranged in order of increasing difficulty, 
the instrument is a scaled test. The process of determining the diffi- 
culty of test items and arranging them in an ascending order on that 
characteristic is called scaling. 


Excerpts from lowa Every-Pupil Test of Basic Skills in Language ? 
das PART HI, USAGE 

Directions: In each of the following sentences there 
are two numbered words or phrases enclosed in 
brackets. If you think the first word or phrase is 
correct, place an X in the first box of the row that is 
numbered the same as the sentence; if you think the 
second answer is cotrect, place an X in the second box 
of the row. 

Study the first two exercises carefully and see how 
they are marked on the answer sheet. Mark the 
other exercises in the same way. 


UT 

1. The ball hit 5. niis 
1. isn't Н 

2. That b al the way to do it. 


1. broke | 


3. The glass is De broken 


1. don't 
. Why A Macs Polly come? 


> 


А 1. аге, s : ; 
75. Either you or I K E going to win the prize. 


1. who 


2. ш I think will win the 


76. There is the boy | 
race. 


1. was 


77. Each of the men ECTS 


| given two blankets. 


1. were| 


78. If Jack b ds | аз old as you, he would help 


you. 


1H. F. Spitzer and others, Тоша Every-Pupil Tests of Basic Skills, Test C, Basic 
Language Skills, Advanced, Form O. Published by Houghton Mifflin Co., 1943. 
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The scaled test is illustrated by a sampling of items from the 
English usage section of the Jowa Every-Pupil Tests of Basic Skills. 
The items are the first four and the last four from the advanced 
level of the test and are designed for use in Grades 5 to 9. 


Speed and power tests 


Tests of speed or rate and of power are used both in educational 
and intelligence testing, although personality inventories seem to 
involve neither the speed nor the power concept. 

A speed test usually consists of items approximately equal in 
difficulty. Ordinarily such a test contains so many items that no 
pupil is able to finish in the working time allowed. Usually the items 
are so easy that there is no question about the pupil's ability to 
answer them correctly. The number of items answered correctly in 
the specified time is taken as a measure of the pupil's speed or rate 
of work. Thus, speed tests are measures of the speed and accuracy 
with which a pupil is able to respond to standardized items of a 
uniform degree of difficulty. 

A power test consists of a series of items arranged in ascending 
order of difficulty, and hence is also a scaled test. It measures а 
pupil's ability to answer more and more difficult items within a given 
field. Usually no measure of the pupil's rate of work is secured, for 
the time allowed is sufficient for nearly all pupils to complete as 
many of the items as they are able to answer. In actual practice, 
however, the factors of power and speed are combined by taking as 
the pupil's score the number of items he answers correctly in the 
specified time. Theoretically, a pupil’s score on a power test should 
represent the degree of difficulty of the most difficult item he is 
able to answer correctly, but such a score is so hard to obtain 
that the number of items answered correctly is generally taken as his 
score. A work-limit test, as the term is used in intelligence testing, 
is a power test on which the pupil may work until he is satisfied that 
he has done all he can or at least until practically every pupil in the 
test group has finished. 

If a speed test may be compared to a race in which as many hurdles 
of uniform height as possible are to be cleared during a specified 
period of time, a power test may be compared to a contest in which 
the hurdles to be jumped regularly increase in height from very low 
at the start to such eventual height that no one is able to jump the 
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next hurdle. In the first case, the score (speed) would be expressed 
in terms of the number of hurdles jumped during the specified time. 
In the second case, the score (power) would be expressed as the 
height of the last and highest hurdle the individual was able to 
clear. 

Many modern tests are really hybrids resulting from a combination 
of the power idea and the speed idea. They are made up of items 
arranged in ascending order of difficulty, but the resulting scores are 
expressed in terms of the number of items answered correctly in the 
specified working time. Since in achievement testing sufficient time 
is ordinarily allowed for at least 80 or 9o per cent of the pupils 
to finish, the speed factor does not receive much weight in the 
resulting scores. A £ime-limit test, however, common in the measure- 
ment of intelligence, is a power test given with such limited timing 
that no pupil is likely to complete it during the time allowed. Used in 
intelligence testing when a measure reflecting both power and speed 
is desired, the time-limit test has no direct counterpart in educational 
testing. 

Furthermore, certain multiple-attribute tests of achievement com- 
bine the measurement of power and speed in one test or performance 
although not in one score. Thus, accuracy and speed in typewriting, 
legibility and speed in handwriting, and comprehension and speed 
in reading are dual, although not necessarily equally important, 
characteristics of a good performance. Inasmuch as stress on speed 
usually reduces quality and stress on quality typically reduces speed 
of performance, an optimum balance between the two characteristics 
is ordinarily desired. Even here, however, the relative demands for 
quality and speed may vary considerably both in school and in out- 
of-school situations where such performances are common, so that 
no fixed pattern of weighting for the power or quality and the speed 
scores is feasible. 


Verbal, non-verbal, and performance tests 


Another classification cross-cutting educational, intelligence, and 
personality tests is that dependent on the degree to which words 
are used in test items and in pupil responses. 

Verbal tests, by far the most common, are ordinarily of the pencil- 
and-paper variety although they may be oral or may even require 
identification of physical objects and materials presented. In any 
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event, words are used by the pupils in attaching meaning to, in re- 
sponding to, or both in comprehending and in responding to, the 
test items, A test unless qualified or further described is ordinarily a 
verbal test of the pencil-and-paper type. 

Non-verbal tests, again of the pencil-and-paper or even oral variety, 
are those in which pupils do not use words in attaching meaning to 
or in responding to test items. Tests involving the use solely of 
numbers, of graphical representations, or of three-dimensional ob- 
jects and materials are of this type. 

Performance tests are also non-verbal but they may require use 
of pencil and paper by the pupils in responding, they may require 
solely the manipulation of physical objects and materials, or they 
may require paper-and-pencil responses to physical objects and ma- 
terials presented in certain ways. Such tests are commonly used with 
persons having serious language handicaps and in situations where 
certain types of skills are of greater importance than is verbaliza- 
tion ability. 

Distinctions between verbal and language tests and non-verbal 
and non-language tests should perhaps be made here. A verbal test 
is necessarily a language test. But a test may involve language, either 
oral or written, in the instructions given to the pupils and neverthe- 
less be non-verbal if the test itself and the pupil responses do not 
involve language. Some performance tests must be non-language as 
well as non-verbal, and hence involve the giving of instructions in 
pantomime. 


Teacher-made and standardized tests 


A distinction most fundamental in educational testing, also ap- 
plicable to personality measurement but not pertinent in intelligence 
testing, is that between the teacher-made and the standardized test. 
The teacher often constructs educational achievement tests and some- 
times develops informal inventories or questionnaires for measuring 
such personality characteristics as interests and attitudes. Standard- 
ized tests occur in all three major areas of measurement—educa- 
tional, intelligence, and personality. 

The most common types of teacher-made tests are the oral, essay, 
and informal objective. The oral test is typically developed by the 
teacher in the classroom as the occasion warrants and consists of 
asking individual pupils questions to be answered orally. The essay 
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test, consisting of questions to which the pupils respond in writing, 
is also ordinarily used by a teacher in his own classes. 

Informal objective examinations, most often prepared by a teacher 
for use in his own classes, may be constructed cooperatively by two 
or more teachers for use with their several classes in the same sub- 
ject, or even by several persons for use throughout a large school 
system. Such tests may even be printed. They are, however, informal 
objective examinations unless procedures for standardization are 
carried out and the tests are made available for general use to inter- 
ested persons outside of the school situation in which they originated. 
Illustrations of item types commonly used in informal objective 
tests are given later in this chapter. 

A standardized test is composed of test or inventory items selected 
in the light of the particular type of achievement, mental ability, or 
personality trait the instrument is designed to measure. The items 
have necessarily been subjected to a preliminary tryout with a rep- 
resentative pupil group so that it became possible to arrange them 
in the desired manner with respect to diffüculty and the degree to 
which they effected certain types of discriminations among groups 
of pupils. Such a test is accompanied by the appropriate type of 
table for transforming resulting scores into meaningful characteriza- 
tions of pupil achievement, mental ability, or personality. 

Both the informal objective test and the standardized test come 
under the broader heading of objective tests. When used in measur- 
ing educational achievement, the informal objective and standardized 
tests typically make use of the same item types. Both of these exami- 
nation types are marked by three important features: (1) brevity of 
pupil response, (2) extensive sampling, and (3) absence of personal 
judgment in the scoring of the examinations. The pupil indicates his 
response by such simple physical reactions as underlining a word, 
encircling a number, filling an answer space, or writing a word or 
short phrase in an indicated place. 


Tests, non-test tools, and techniques 


An important but perhaps fairly obvious distinction exists among 
tests, non-test instruments or tools, and techniques neither of a test 
nor of a tool nature. The distinction is important particularly in 
educational and personality measurement but also pertains in some 
degree to mental measurement. 
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Tests, discussed above, are used directly by the pupil. There are 
a number of non-test tools, such as the cumulative record, pupil pro- 
file, progress chart, class analysis chart, and report card, which are 
not used by the pupil who is being evaluated. In addition, there are 
a number of techniques, primarily observational in nature, such as 
the anecdote, the case study, the interview, sociometric methods, and 
techniques in the area of group dynamics. 


2 EDUCATIONAL TESTS 


Considered as educational here and throughout this book are all | 
instruments and techniques designed to measure what the individual 
has learned both in and out of school. It is obviously impossible to 
be certain about the exact proportions of the attainments of a school 
pupil that are the result of direct classroom instruction, of the by- 
products of classroom and other school activities, and of the wide 
range of his out-of-school experiences. A rather wide variety of tests, 
of other instruments, and of techniques for measuring types of abili- 
ties not definitely taught in any classroom or even in the school 
should be considered educational, for the education of the child is 
not confined entirely to the hours he spends in school. This broad 
conception of educational measurement and evaluation underlies the 
point of view presented in this volume. Various aspects of educa- 
tional testing are dealt with in greater detail in Chapters 5 to 9 
and in the section of this book devoted to measurement and evalua- 
tion in various subject fields. 

When examinations, other tools, and techniques are CIEN in 
terms of their form or structure, five types may be distinguished: 
(1) oral examinations, (2) essay examinations, (3) objective exami- 
nations and scales, (4) performance tests and scales, and (5) other 
evaluative instruments and techniques. 


Oral examinations 


Oral questioning of pupil groups is used in the classroom for 
measuring recall of factual knowledges. Such questioning often con- 
stitutes a major part of the so-called recitation, in fact. It usually 
consists of asking pupils sequentially or in a somewhat random 
order to answer questions based on the assignment for the day and 
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of attempting to evaluate the quality of their responses. Oral ex- 
amining may take many forms, some of which are inappropriate but 
others of which are sound as a measurement device. A section of 
Chapter 6 deals with the oral examination in some detail. 


Essay examinations 


Tn the essay examination, a limited number of questions (usually 
five to ten) are stated by the teacher as a basis for written answers 
by the pupils. Typically, the questions are selected by the teacher 
to elicit essay-type responses on the subject matter of the course 
the individual pupils have learned. This type of examination fre- 
quently poses a question of the who, when, where, what, or why 
type, although it may ask pupils to name, to locate, to discuss, to 
evaluate, to distinguish between, to define or describe, to illustrate 
or explain, to give reasons for or causes of, or otherwise respond to 
more-or-less definite issues. 

The essay-type examination is often used as a final examination 
ог as a test over several weeks of course work. As such, it may be 
thought of as the essay examination proper. It is also often used in 
shorter form as a check on pupil preparation of assignments, in 
which situation it is usually known as a written quiz. This form of 
examination has both inappropriate and sound uses and may result 
either in accurate or inaccurate judgments concerning pupil achieve- 
ment. A quite complete discussion of the essay examination and of 
means for insuring its accuracy as a measuring instrument when it 
is appropriately used appears in Chapter 6. 


Objective examinations and scales 


As was pointed out in a preceding section of this chapter, the dis- 
tinction between informal objective tests and standardized tests of 
achievement is concerned with matters other than the types of test 
items employed. The item forms used in the informal objective test 
are limited only by the degree of ingenuity employed by the teacher 
or teachers who construct it. Similarly, the quality of the informal 
objective test and its appropriate uses are below those of the stand- 
ardized test only if the constructors lack the ability and the desire 
to construct a good test. The principles of objective test construc- 
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tion are so closely similar for these two types of tests that they are 
treated here under the more general heading of objective tests. 

А tremendous variety of item types has been developed, and new 
adaptations are quite common. However, all objective items may be 
classified either as the recognition or the recall type. Recognition 
types, of which the alternate-response, multiple-choice, and match- 
ing forms are the most common, make only indirect demands upon 
the initiative of the pupil, inasmuch as the factual material basic 
to the issue in question is stated (or misstated) in the item. Recall 
types, however, of which the simple recall and completion forms are 
probably the most common, place demands upon the initiative and 
frequently the memory of the pupil by expecting him to supply and 
state the correct answer. 

The illustrations below show how a factual knowledge can be 
measured by the three of the above types that are most brief in 
form. The first two are recognition and the third is recall in form. 


1. The President of the United States in 1863 was Abraham 

Lincoln. @ Ё 
2. The President of the United States in 1863 was (a) 

Ulysses S. Grant, (b) Millard Fillmore, (c) Abraham 

Lincoln, (d) Andrew Johnson, (e) Zachary Taylor. (c) 


з. The President of the United States in 1863 was (Abraham Lincoln) 


The tremendous variety of objective examination item types and 
the complexity of some of them makes impossible the presentation 
of more than a few of the most common forms here. A comprehensive 
treatment of this important type of examination is given in Chapters 
s and 7. 

Survey, inventory, and prognostic tests. These three types of tests 
serve different purposes and are constructed on somewhat different 
lines, but all three may be considered general tests in the sense that 
their functions demand resulting scores which have general signif- 
icance rather than highly specific or analytic meaning. 

Survey tests are instruments that measure general achievement in 
certain subjects or fields of knowledge. They are used to test skills 
and abilities of widely varying types. Thus, a survey test might 
measure achievement in first-year algebra. Another, and broader, 
survey test might measure ability in all areas of mathematics at the 
high-school level. A still broader survey test might measure abilities 
in all of the inajor areas of the secondary-school course of study. 
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Inventory tests are similar to survey tests in specific school sub- 
jects, although the similarity is restricted to their form rather than 
to their use. Whereas survey tests are often used after instruction or 
during the instructional process, inventory tests are intended for use 
prior to instruction as an aid to the teacher in keying his instruc- 
tion to the background learnings and levels of advancement of his 
pupils. 

Prognostic tests are intended for use in the prognosis or predic- 
tion of future success in specific subjects of the school curriculum. 
As they usually test the background skills and abilities found to be 
prerequisite for success in the particular subject, prognostic tests are 
most common among subjects in which success can be rather well 
defined in terms of certain basic abilities. They also frequently test 
some of the aptitude factors that are not directly dependent upon 
previous training of a specific type. Therefore, prognostic tests, prob- 
ably most closely related to aptitude tests but not unrelated to in- 
ventory tests, are more properly classified as educational tests than 
as intelligence tests, although they unquestionably do measure cer- 
tain special aspects of intelligence. The accompanying illustration 
from the Prognostic Test of Mechanical Abilities shows how previous 
learning contributes to success on this type of test. 


Excerpt from Prognostic Test of Mechanical Abilities * 
' TEST II. READING SIMPLE DRAWINGS AND BLUEPRINTS 
16-30. DIRECTIONS: The following are exercises in reading simple drawings and blueprints, Read 


each statement, look at the dra ‚ and write the letter D appears before the best answer on the 
line to the right of the statement. Jo not use a ruler or 


16. The length of line A is: 


81%" blu" ely” aly” e2" „айза 
17. The dt of line C is: 
DEG eia, азм, 619," —M 
18. The л, from line C to line D is: 
ay by” ey” du" ej” — 15 


19. [е ОЕ of the longest line is: 
Du" €2" ary" ezu" еы, 
20. The Jength of the shortest line is: 
blu" е2" 424" 02%" P 
21. ‘et Ais ghorter than line B b; 
aM" bM" ey” a DU MEL 


2 J, Wayne Wrightstone and Charles Е. O'Toole, Prognostic Test of Mechanical 
Abilities, Form A. Published by California Test Bureau, 1946. 
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Diagnostic and analytic tests. Tests of diagnostic and analytic 
types are intended for the separate measurement of rather specific 
aspects of achievement in a single subject or field. Diagnostic tests 
measure somewhat narrower aspects of achievement than do analytic 
tests, so they may be thought of as serving specific and general 
diagnostic functions respectively. 

Diagnostic tests yield measures of highly related abilities under- 
lying achievement in a subject. They are designed to identify par- 
ticular strengths and weaknesses on the part of the individual child, 
and within reasonable limits to reveal the underlying causes. 

The relation of each skill to other skills and to the total process 
‘in the case of multiplication of fractions and mixed numbers is shown 
in the accompanying reproduction of Test VII of the Compass Diag- 
nostic Tests in Arithmetic. The diagnostic procedure here is based 
on the assumption that mastery of the total process can be no 
stronger than the weakest link in the chain of related skills. Accord- 
ingly, each skill called into play in the total process so far as pos- 
sible is isolated and measured. The parts of the test not reproduced 
here deal with reducing answers to best form, fundamentals of mul- 
tiplication of fractions, and finding errors. 

Analytic tests may be considered as general diagnostic tests. The 
term "diagnostic" as applied to educational tests has resulted in many 
misconceptions. Fundamentally, all tests may be considered diag- 
nostic in the sense that they actually yield useful information about 
pupil achievement. However, the diagnosis afforded by many present- 
day tests is extremely general. Many so-called diagnostic tests are 
not diagnostic, but are merely analytic tests. 

In contrast to the specific diagnosis that appears to be possible 
in the case of arithmetic is the general type of analysis that seems to 
climax the best efforts of test makers in the fields of language, read- 
ing, science, and the social studies. Attempts to analyze language 
and reading, for example, with a view to the construction of diag- 
nostic instruments, immediately encounter the impossibility of re- 
lating to each other in any causal way the several phases of the 
subject on which achievement in it seems to depend. Causal rela- 
tionships have not been established among such common factors in 
silent reading ability as word meaning, rate of reading, comprehen- 
sion of facts, and ability to get the main idea. Consequently, in many 
subjects measures of different abilities are necessarily treated as in- 
dependent and unrelated aspects of total ability. 
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Excerpt from Compass Diagnostic Tests in Arithmetic ? 


PART 1—CHANGING MIXED NUMBERS TO IMPROPER FRACTIONS 


Directions: Change the mixed numbers below to improper fractions. 
Study the samples before you begin to work. Saws: 2f=4f 8j 


bi- 5- II х= = 14= 
3 = =, 2{= 1ф= ya = 
y= 3}= 2}= 2fr= Igy = "T 


Score on Part 1 = Number rights... ees if 
[Total possible score =18 points] 


PART 2—CANCELLATION IN MULTIPLICATION OF FRACTIONS 


Directions: Do all of the cancelling possible in the fractions below. Do not take time to finish the examples. Simply 
show all of the cancelling. Study the samples carefully before you begin to work, 


SAMPLES: 1 1 Я 

b4 bi 

1 1 6 
4x4 as Xie vis xi ox угхі text 
ixir ixis txt 17х3 тох ext 

"xd ixi тї Xir іхіхтг ixba HXtxit 
Score on Part 2 = Number rights... 

[Total possible score =18 points} 


An illustration may serve to summarize the essential features of 
diagnostic and analytic tests. On certain points along the rim of the 
Grand Canyon there are lookout stations equipped with telescopes, 
each pointed and focused upon a specific spot of beauty or grandeur. 
From each of these a separate view of the beauty spots of the 
canyon is secured. From the composite of all of these views, a much 
more accurate appreciation of the total panorama is obtained, yet 
each view is quite independent of every other one. This is typical of 
the way in which tests of the analytic type operate. In distinct con- 
trast with this example, the best way to illustrate the operation of 
the diagnostic type of test is to liken it to an inverted pyramid made 
of bricks. The removal or the crumbling of a single brick at any 
point in the wall will cause it to fall. The accompanying diagram 
may be helpful in clarifying the essential differences in these two 


types of tests. 


3 G. M. Ruch, F. B. Knight, Н. A. Greene, and J. W. Studebaker, Compass Diag- 
nostic Tests in Arithmetic, Test VII, Form A. Copyright by Scott, Foresman and 


Co., 1925. Reprinted by permission. 
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Column 
Addition 


Fig. |. Contrast between analysis and diagnosis 


Quizzes and mastery tests. Instruments of these types are most 
often teacher-made, for they are ordinarily used in particular courses 
when the teacher sees need for them. They differ from most objective 
tests in length and in function more than in form. By nature the 
Scope of these tests is quite restricted. 

Quizzes, typically consisting of ten to fifteen true-false items or 
eight to ten simple recall items or problems, are used by many 
teachers on occasion during a portion of a class period for the pri- 
mary purpose of determining whether or not the pupils have read 
assigned materials. Although they may be announced in advance, 
they frequently are given without forewarning to the pupils. 

Mastery tests are designed to measure only those fundamental 
skills and abilities that all pupils supposedly have acquired, so 
the tests are at a very low level of difficulty for most pupils. Perfect 
Scores are therefore commonly made by a majority of the pupils. 
These tests are typically constructed by the teacher for use in his 
own classes, although some of the workbooks designed for use in 
particular courses include tests of this type. 

Instructional and practice tests. Tests of these types are some- 
times constructed by the teacher, but they are also included in many 
of the workbooks provided particularly for elementary-school sub- 
jects and to some degree for high-school subjects. The scope of a test 
of this type is narrow, for typically only one aspect of a skill or one 
phase of a content area is covered. The exercises in such instruments 
may be either objective or semi-objective, for the primary value lies 
in their use as teaching aids. 

Variously called instructional tests, practice tests, drill tests, and 
practice exercises, these instruments merit brief mention here be- 
cause they are measuring instruments even though they are designed 
primarily for teaching rather than for testing purposes. 


MEASURING INSTRUMENTS AND TECHNIQUES 51 


Source scales. Used entirely if not solely in the area of spelling, 
source scales, frequently referred to as product scales, are instru- 
ments used in constructing tests. The instruments themselves are 
never placed in the hands of the pupils. The spelling words of a 
given degree of difficulty are grouped together and are placed on a 
scale of difficulty from easy to difficult. The teacher wishing to con- 
struct a test of a given degree of difficulty for a certain grade may 
do so by selecting appropriate words from the source scale. 

The following excerpt from the Тоша Spelling Scales for the eighth 
grade shows the words in Steps r2 and 15, for which the average 
percentages of misspellings are 58 and 34 respectively. On the av- 
erage, the word “client” in Step 12 is misspelled by 62 per cent and 
the word “canvass” in Step 15 by 31 per cent of eighth-grade pupils. 


Excerpts from lowa Spelling Scales * 


Step 12 — 5876 58% 55% 
62% all right anticipating 
client alumni circuit 
convenient anticipate disappoint 
council assessment equipped 
immense candidacy immediately 
permanent continuous 
principle fundamental Step 1 5 — 34% 
geometry 37 Jo 
61% girlie definitely 
accredited physician fraternally 
characteristic possess 435% 
ie thorough anniversary 
scientific 57% y 
3476 
60% thoroughly zephyr 
analysis 56% 33% 
correspondence accompanying clisutsuque 
mortgage acquaintance Koray. 
Sabbath auntie 
originally 31% 
59% recommendation canvass 
enthusiastic 
lieutenant 
unusually 


4 Ernest J. Ashbaugh, The Iowa Spelling Scales, Grade VIII. Published by Bureau 
of Educational Research and Service, State University of Iowa, 1944. 
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Performance tests and scales 


Tests of this type involve the manipulation or at least the use 
in some form of two-dimensional and perhaps primarily of three- 
dimensional objects. Paper and a pencil are sometimes necessary 
for the pupil in taking such a test, but in other situations the pupil 
actually constructs or manipulates some physical materials in the 
attempt to demonstrate his skill. Performance tests are dealt with 

‘in Chapter 8 of this volume. 

Object tests. In tests of this type, physical objects are presented 
to the pupils for them to identify by name or type, for them to 
classify in some prescribed manner, or for them to use in identifyin gx 
certain characteristics of the objects. Manipulation of the object s 
is not entailed unless handling is necessary in obtaining answers to 
the questions. Pupil responses are often made by the use of pencil 
and paper, but this situation differs in obvious ways from a paper— 
and-pencil test where the same objects are produced in photographs 
or line drawings. Object tests are most often used in testing ability 
to identify and discriminate among the various tools, materials, and 
specimens used in the industrial and practical arts and the physical 
sciences. 

Quality and rating scales. In many school subjects, such as hand— 
writing, composition, industrial arts, and the practical arts, pupils 
produce results in some tangible form through the application of 
their skills to assigned tasks. Their products differ in quality and in 
speed of production. The teacher's problem becomes one of evaluat- 
ing the product and perhaps of recording the time taken in its 
production. 

A quality scale is used in judging certain types of products in 
handwriting, lettering, artwork, shopwork, and other areas. Hand- 
writing scales consist of a series of specimen performances exhibiting 
varying degrees of quality from the lowest to the highest. The speci- 
mens are arranged systematically in order of increasing quality. 

Usually the quality of each is described numerically. A quality scale 
of this type is used by matching the performance to be described 
with the specimen of the scale that most nearly resembles it in 
quality. 

A rating scale or score card is used in evaluating other types of 
products, especially those in industrial arts and practical arts. Such a 
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scale and assigned numerical ratings thought to be appropriate for 
each characteristic of an excellent product is prepared in advance. 
The teacher then is able to rate numerically the quality of each 
pupil's production by observing, measuring, and otherwise judging 
his product on each rating-scale characteristic and by obtaining the 
summation for all characteristics. 


Check lists 


A check list is most often used in evaluating the procedures used 
by a pupil in the performance of some assigned skill. Such check 
lists are useful in laboratory sciences and other performance areas 
in which certain tasks are most effectively performed by the use of 
sequences and techniques previously taught to the pupil. A check 
list of actions, both appropriate and inappropriate, is prepared in 
advance. The teacher observes each pupil separately as he attempts 
to perform the assigned task and keeps a running account of his 
procedures, both good and bad, in order of occurrence. The pupil's 
performance is then evaluated in terms of how closely it compares 
with the most efficient and direct procedures for reaching the goal 


he was seeking. 


Other evaluative instruments and techniques 


To be treated here are the tests, non-test instruments, and tech- 
niques that are used in evaluating learning outcomes not closely re- 
lated to specific school subjects and used in the presentation and 
summarization of other measurement results. Knowledge of their use 
constitutes at least as important a part of the teacher's equipment 
for measuring and evaluating pupils as is true of the more formal 
instruments discussed above. Evaluative instruments and techniques 
are the subject of Chapter 9 of this volume. 

Evaluative tests. Although it is difficult and hazardous to attempt 
any distinction between evaluative tests and more formal tests, there 
are at least several types of tests which in form or in types of be- 
havior measured probably should be considered evaluative in nature. 
In general, these evaluative tests are designed to measure some of 
the relatively intangible types of instructional outcomes. 

Interpretive*tests are represented by integrated units of test. ma- 
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terials for the measurement of such relatively intangible outcona es - 
as ability to interpret data, ability to interpret literature, ability to 
apply principles in the sciences, logical reasoning, and critical thin Iz — 
ing. Tests of practices and activities, concerned with such out-O f— 
school activities as health and safety practices and fiction reading, 
are also of this type. Certain tests of values in the areas of literature 
and reading, the fine arts, and recreations measure such intangible 
outcomes of school and out-of-school experiences as appreciations 
and satisfactions. In addition, some of the test batteries proper lw 
classified as a whole under survey tests have distinct evaluative fea— 
tures embodied in certain parts. Found in such batteries are basic 
skills tests in the portions not related directly to school subjects 
and the interpretive parts of tests of general educational develop— 
ment. 

Other evaluative tools. Among these other tools used in pupil 
evaluation are the profile chart, the progress chart, the cumulative 
record, and the report card, all used with the individual pupil, and 
the class analysis chart, used both for an over-all evaluation of the 
group and for relating the status of the individual pupil to the over — 
all group picture. 

Evaluative techniques. 'The interview and the questionnaire, both 
usually informal when used in pupil evaluation, are perhaps the mos t 
widely used techniques for evaluating educational outcomes. Both 
are, of course, used more formally for more Specialized purposes 
Some of the techniques discussed under personality evaluation later 
in this chapter, such as the anecdotal record and the case study, may- 
also be used to measure educational outcomes even though their 
typical use is in the area of personality evaluation. 


Q INTELLIGENCE TESTS 


Intélligence tests measure what is perhaps most simply and most 
commonly described as ability to learn or ability to adapt oneself 
to new situations. Whereas achievement tests measure skills or abili- 
ties more or less directly, intelligence tests face the problem of 
measuring mental qualities indirectly in terms of the manner in 
which an individual's intelligence affects or conditions his behavior. 
It is sufficient here merely to comment upon this important distinc- 
tion. Chapter 1o presents more fully the problems and techniques of 
intelligence testing. 
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General intelligence tests 


The most widely known tests of mental ability are usually re- 
ferred to as general intelligence tests, although such other terms as 
general mental ability tests and psychological examinations have 
almost identical meanings. Other terms having similar meanings are 
general ability tests and aptitude examinations. General intelligence 
tests attempt to measure mental ability broadly enough, by the use 
of a wide variety of test situations in scaled order of difficulty, to 
obtain a measure representative of the individual's mental efficiency 
in general. 

Results from general intelligence tests have so many uses, as in 
educational guidance, vocational guidance, sectioning of classes, and 
diagnosis, that it is impossible at this point to do more than mention 
this fact. 

Individual intelligence scales. Intelligence tests that can be ad- 
ministered to only one person at a time are known as individual 
intelligence examinations. Such tests require the full attention of a 
trained examiner. Although the techniques for administering these 
tests are highly standardized, the examiner modifies the procedure 
in various ways according to the age, ability, and even sex of the 
pupil being tested. Since these instruments are usually in scaled form, 
and are frequently devised to cover a wide age range; they are often 
called age scales. 

Individual intelligence tests only on occasional sections require the 
use of pencil and paper by the subject under examination. Some parts 
are even of a performance test nature. Many of the pupils’ responses 
are given orally and are recorded by the examiner. 

Group intelligence tests. Group tests of intelligence or general 
mental ability are usually paper-and-pencil tests that can be ad- 
ministered to a large group of persons at the same time. Group intelli- 
gence tests of the “omnibus” variety are ordinarily not divided into 
parts but have the items in mixed order with respect to the nature 
of the abilities they test and also sometimes with respect to their 
objective form. More commonly, however, group tests of intelligence 
have a number of different parts, each of which deals with a certain 
broad type of performance. In several of these tests two or more 
part scores are combined to obtain a verbal score and the remaining 
two or more part scores are combined to net a. non-verbal score. 
When these two aspects of mental ability are measured by separate 
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tests, the instruments are ordinarily called verbal tests and поза – 
verbal tests. 


Excerpts from Pintner General Ability Tests 5 


Pintoer Verbal: Totermed.: A 
TEST 2. LOGICAL SELECTION 
Directions, Look at the sample that follows. i 
Sample. Atablealwayshas— 1 flowers © tablecloth $ legs 4 varnished top Somme... {1 
A table always has legs, which is number 3; so the third answer space is marked in the margin, { 
Read each statement. Find the thing it is most likely'to have. ‘Then mark the answer space in the 


margin which is numbered the same. 


1. A forest alwayshas— 1 snow: 2 trees 3 beasts 4 а forester 5 hunters) Hr 

2. Aled — Tboys runners Sie 4 paint 5 wood.-;,..... 1 

3. A horse— 1 tail. 2 harness S.shóes- 4 stablé б rider. m 

4. Átrain — 1 windows: 2 passengers -3 wheels 4 iron doors + 5 dineri | i 

6. An orchestra — X hall ? conductor 3 drum 4 instruments, .5 audiences || |! 

6...А game —. lplayérs . 2 cards 3 tables: 4 penalties -5 goals... в | i | i 


The accompanying sample items from the Pintner General Ability» 
Tests, Verbal Series, for the intermediate grades, illustrate one of 
the techniques used in group intelligence tests. 


Specific intelligence tests 


In contrast to the general intelligence tests that attempt to measure 
broadly the ability to learn are the tests of Specific intelligence that 
attempt to measure ability to learn in relatively narrow fields of 
subject matter or areas of performance. 

Aptitude tests." Aptitude tests аге frequently referred to as tests of 
specific intelligence. They attempt to measure the aptitude of a per- 
son, and often to forecast his probable future Success, in certain 
school subjects or certain areas of performance, They are de- 
signed for use with persons who тау or may not have had pre- 
vious experience in the achievement areas with which they deal. 
Such tests attempt to measure the potentialities for success apart 
from those abilities resulting from specific training. Aptitude tests 
are not necessarily used for predictive purposes, although that is 
probably their most common use. They are found for such areas as 
English, foreign languages, music, art, mathematics, and the sciences, 


5 Rudolf Pintner, Pintner General Ability Tests, Verbal Series, Intermediate, 
Form A. Published by World Book Co., 1938. 
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and for such specific subjects as algebra, geometry, physics, and 
chemistry. 

Readiness tests. Reading readiness tests have for some years been 
used with primary-school children in order to determine whether or 
not they have reached a level of maturity necessary for success in 
reading. Arithmetic readiness tests have been devised more recently 
for use in determining whether pupils have sufficient mental maturity 
to permit efficient learning of various arithmetic skills. Although 
there might be some question concerning the classification of readi- 
ness here, it seems that particularly for children entering school for 
the first time these tests more largely measure special mental abilities 
than the results of learning. 


Group-factor tests 


Tests of group factors of intelligence have evolved since general 
intelligence and specific aptitude tests first made their appearance, 
and have recently attained considerable growth in usage. Group 
factors of intelligence are in a sense midway between specific and 
general intelligence. 

Bi-factor tests. The two group factors most widely accepted and 
embodied in testing practices are the verbal and non-verbal factors 
referred to above. Essentially the same factors are called linguistic 
and quantitative in one psychological examination, whereas in still 
another mental capacity test they are termed language and non- 
language. 

Multi-factor tests. Still newer in testing practice are the several 
tests now available for measuring from six to eleven, and in one 
test considerably more, group factors of intelligence. Perhaps the 
term most often used to characterize these multiple group factors of 
intelligence is primary mental abilities. Representative of such fac- 
tors are spatial, numerical, manual dexterity, memory span, induc- 
tion, and deduction. 


Performance tests 


Performance tests are of several types, which cut across the classifi- 
cations of intelligence tests given above. Some are individual and 
others are group tests. Some measure general intelligence and others 


measure specific aptitudes. The term usually designates tests for 
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which motor or manual responses rather than verbal or written 
sponses are required of the pupil. These tests are often devised fOr 
use with illiterates, backward children, persons who are unfamiliar 
with English although they may read and speak a foreign langua ge, 
and handicapped persons of various types. They frequently involwe 
pantomime rather than verbal or printed directions and are usually 
at a rather low level of difficulty. 

One type is a group paper-and-pencil test of general intelligexa ce 
requiring no handwriting, given to illiterates and others not able to 
read and speak English with ease. Another type of general intelli- 
gence test given to one person at a time requires such performances 
as fitting of blocks into form boards, putting together of what 
resembles a jigsaw puzzle, and imitating actions of the examiner. 
Still others, making use of manipulative tests similar to the above 
except that they require more dexterity and place a premium upon 
speed of response, are individual tests used in the measurement Of 
certain types of mechanical aptitude for adolescents and even adults. 


re- 


4 PERSONALITY INVENTORIES AND EVALUATIONS 


Although psychologists are in agreement that the common соха – 
ception of personality is not psychologically sound, they are not im 
agreement concerning the real meaning of the term. They do, һоуу 
ever, believe that personality has to do with the total behavior of 
the individual, both that which can and that which cannot be obs — 
served. In the discussion that follows, four of the types of behavior 
generally classified under personality which seem to be most изейїї1 
concepts to the teacher are discussed. These, as well as some othe x 
types of behavior usually listed under personality, are discussecid 


more completely in Chapter тт. 


Attitudes scales 


The attention that has recently been called to attitudes by several 
nationwide surveys of public opinion illustrates the educational 
importance of attitudes. Attitudes are formed, crystallized, and 
Sometimes modified or changed in the home, the church, on the 
playground, and elsewhere, as well as in the school. ) 

(Attitudes scales are of several types, but they frequently are based 
on a two-, three-, or five-point scale of agreement-disagreement with. 
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statements concerning controversial issues or at least issues on 
which opinions may readily differ. Some such scales deal with a 
specific issue, such as attitude toward the Chinese. Others are gen- 
eralized, and may deal equally well with attitudes toward any racial 
group, or some other general quality. 

The results from the measurement of attitudes are useful in a 
variety of ways both in school and in social situations, for certainly 
attitude changes occur as one type of instructional outcome and 
the attitudes of pupils undoubtedly influence their adjustment in the 


school. _ 


Interests inventories 


( The interests of different individuals vary tremendously. Not only 
are the individual’s fields of interest sometimes obscured intention- 
ally or unintentionally by his behavior, but in some instances his 
real interests may be unknown to him. Interests questionnaires use 
techniques somewhat similar to those for attitudes testing, and fre- 
quently request indications of the presence of interest or the degree 
of interest a person has in various occupations, modes of behavior, 
types of activity, kinds of reading, and types of recreation, to name 
only a few. Results from interests inventories are rather widely used 
in vocational guidance, and also have uses for the teacher in aiding 
him to adapt his instruction to pupil interests) 

The accompanying illustration from one of the interest parts of 
the Pressey /nterest-Attitude Test shows one method of measuring 


interests in a variety of things. \ 
\ 


Adjustment inventories 


Adjustment inventories attempt to measure emotional adjustment 
primarily. Known by a variety of names—personality tests, per- 
sonality inventories, personality schedules, adjustment inventories, 
and in various other ways—they ask the pupil to respond objec- 
tively to items probing his behavior, his likes and dislikes, his en- 
vironment, and many other aspects of his life. A major purpose of 
such instruments is to locate those abnormalities and peculiarities of 
behavior, neurotic tendencies, and various other types of maladjust- 
ment that should receive immediate attention if the individual pos- 
sessing them is to become a well-adjusted adult. | 
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Excerpts from Pressey Interest-Attitude Test, Test Ill ° 


Directions: Below is а list of things that people often like or are interested in. Place = 
cross (X) on the dotted line in front of everything which YOU like or in which YO CF 
are interested. Place two crosses (XX) in front of everything in which you are VER “2 
MUCH interested . . . which you like VERY MUCH. You may mark as many ог as 
few words as you wish. But be sure to mark everything which you like or in which yous 
are interested. 


ll artist „card parties 61... dress 
2 .drawing .dancing 62... reading 
3. cartoonist 33..........doctors . children 
4. movie star 34 fashions 
5 engineers 35...........leaders 
36.. photography 
37... — poker 67... social affairs 
horseback 38. ..society 68. coffee 
8. .soldiers 39. university (oc CHI cards 
9... .typewriting 40. auto driving 70. „waltzes 


10...—..—сагпїуа1 


The accompanying excerpt from the Student Form of the Bell 
Adjustment Inventory illustrates items dealing with the (a) hom є 
(b) health, (c) social, and (d) emotional adjustment of the ята 
dividual. 


Excerpts from Bell Adjustment Inventory, Student Form * 
DIRECTIONS 


Are you interested in knowing more about your own personality? If you will answ e r- 
honestly and thoughtfully all of the questions on the pages thai follow, it will be possible 
for you to obtain a better understanding of yourself. 


There are no right or wrong answers. Indicate your answer to each question by drawizx ge 
a circle around the "Yes," the "No," or the *?”. Use the question mark only when you & re 
certain that you cannot answer “Yes” or "No." There is no time limit, but work rapidi 4-_ 


If you have not been living with your parents, answer certain of the questions with re. 
gard to the people with whom you have been living. 


е Yes No ? Do you day-dream frequently? 


s% Yes No ? Do you take cold rather easily fiom other people? 

* Yes No ? Do you enjoy social gatherings just to be with people? 

* Yes No ? Does it frighten you when you have to sce a doctor about some illness? 
* Yes No ? Ata reception or tea do you seek to meet the important person present? 
ө Yes Хо ? Are your eyes very sensitive to light? 

™ Yes No  ?.Did you ever have a strong desire to run away from home? 

* Yes: No ? Do you take responsibility for introducing people at a party? 

* Yes No ? Do you sometimes feel that your parents are disappointed in you? 

1" Yes No ? Do you frequently have spells of the "blues"? 


95, І. Pressey, Interest-Attitude Test. Published by Psychological Corporati yy 
1933. - 

* Hugh М. Bell, The Adjustment Inventory, Student Form. Published by Stay, — 
ford University Press, 1934. 
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Evaluative techniques 


Paper-and-pencil instruments are useful in measuring attitudes, 
interests, and even adjustment, but they cannot be used in measuring 
conduct in the many life and school situations where paper and a 
pencil are not natural parts of the environment. A number of these 
techniques, most of them observational, are used in individual pupil 
evaluation and several are intended for use in the evaluation of group 
behavior. 

Evaluation of individual behavior. Several techniques have been 
evolved for noting the overt behavior of the whole child in natural 
as well as controlled situations. Among such techniques of greatest 
concern for the teacher are anecdotal records, a wide variety of pro- 
jective methods, and the case study. The anecdotal record consists 
of a factual narrative of a particular situation observed by the re- 
corder in which the pupil about whom the anecdote is written played 
a significant and relatively unique role. In the projective method, the 
child's behavior is observed and interpreted by a trained psychologist 
in a controlled situation where the child must react in some observ- 
able manner to materials presented by the psychologist. The case 
study, constituting a summary of a variety of observations and. 
measurements of the individual, involves the assembly, integration, 
and interpretation of all important and obtainable information con- 
cerning the origins, background, environment, and status of the pupil, 

| Evaluation of group dynamics. Included in the evaluation area of 
group dynamics are two techniques of significance for the teacher. 
The first, the sociogram, employs paper-and-pencil sociometric meth- 
ods for determining group patterns of behavior and the place of the 
individual within the group. Analyses of group interactions by direct 
observation in controlled situations again permits evaluation both of 
group behavior and of individual conduct within the framework of 


the social situation. 


Topics for Discussion 


Distinguish the three general types of tests—educational, mental, 
and personality. 

Distinguish between tests and scales. What are scaled tests? 
Indicate the major differences between speed and power tests. 
Distinguish among verbal, non-verbal, and performance tests. 


I. 


e 
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5. Briefly characterize the four forms of educational tests—oral ex- 
amination, essay examination, objective examination, and perforxx1— 
ance test. 

6. Indicate the major characteristics of survey and prognostic tests- 

Of diagnostic and analytic tests. 

Illustrate several types of items used in objective examinations. 

Discuss several types of educational performance tests. 

9. For what achievement areas are source scales and quality scales 
provided? How are these scales related to tests? 

то. What types of evaluative tests and techniques are used in educ za— 

tional measurement? 

ir. Distinguish among tests of general intelligence, of specific intel 13— 

gence, and of group factors of intelligence. 

12. Briefly note some of the differences between individual and gro vag» 

tests of general intelligence. 

13. What do aptitude and readiness tests measure? For what fields are 

they provided? 

14. What do group-factor tests measure? In what areas are they prz o— 

vided? 

rs. Briefly indicate the major characteristics and uses of performance 

tests of mental ability. 

16. Briefly characterize attitudes scales, interests inventories, adju ss t — 

ment inventories, and other evaluative techniques used in per— 
sonality measurement. 


pan 
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4 


Essential Qualities of a Good Measuring 
Instrument or Technique 


THE FOLLOWING aspects of the criteria or distinguishing character- 
istics of a good examination, non-test instrument, or evaluative tech- 
nique are discussed in this chapter: 


A. Validity as an essential characteristic of good measurement 
or evaluation. 

Curricular and statistical validity. 

Reliability as an aspect of validity. 

Methods of determining and estimating reliability. 
Dependence of reliability upon adequacy and objectivity. 
Administrability, scorability, and economy as practical cri- 
teria. 

c. Comparability in the use of test and evaluation results. 

н. Utility as an over-all criterion of good measurement. 


piam Irt 


The selection of any standardized educational test, mental test, or 
personality inventory requires careful consideration of the charac- 
teristics of a good examination. Similarly, the construction of any 
test or non-test instrument and the preparation of evaluative tech- 
niques, whether educational, mental, or in the area of personality, 
require careful consideration of the characteristics of good measure- 
ment. Although the characteristics of a good examination, tool, or 
technique can be listed and classified in many different ways, test 
specialists are in general agreement concerning the aspects that 
should receive attention in selecting or constructing them. The cri- 
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teria discussed below undoubtedly represent the most importara t 
considerations to be taken into account.* 

It is recommended that the student refer frequently to the Ais— 
cussion of Chapter 14 on the statistical methods of determining tes t 
validity and reliability in connection with the study of these t ww €» 
exceedingly important criteria. An adequate understanding of these 
criteria depends on both their theoretical and their statistical aspects -~ 


] vaubir 


Validity is the most important characteristic of a good exami за za— 
tion, for unless a test is valid it serves no useful function. Z~ Are 
validity of an examination depends on the efficiency with whicfz z£ 
measures what it attempts to measure. A test must, therefore, zx«-— 
complish the purpose the user has in mind in order to satisfy this 
fundamental criterion for all testing. In fact, the uncritical acce» t— 
ance of an invalid test by a teacher for performing a desired functi x1 
might easily result in serious injustice to the pupils. Accordingly, 
teachers cannot be too careful in assuring themselves of the validi t» 
of the tests they use. For example, a teacher who used a test that 
measures only knowledge of facts in a course in American histor» 
would not be correct in drawing conclusions, on the basis of the 
results, about the abilities of his pupils to apply historical facts to 
the reasoned interpretation of events. 

It follows, also, that a test must be used with pupils who possess 
the proper intellectual maturity and background of experience fay 
taking the test if it is to possess validity. For example, а standaz<q_ 
ized arithmetic survey test intended for use with pupils in Стасі є 
6 to 9 might be invalid for use with most of the pupils in Grade = 
and probably with all pupils in the lower grades. 

Lindquist ? illustrated validity by pointing out that a test of h zu 
validity for ranking high-school pupils on general achievement — 34. 


1 Although the discussion of this chapter is typically in terms of test or exea 
nation criteria, in order to avoid cumbersome wording, the reader should bea x- 
mind the fact that the broad and general rather than the narrower, special use 
the term is intended. The criteria should be interpreted as applying in only dig, күү 
modified form to non-test evaluative tools and to evaluative techniques as wel] У 
to tests and examinations. , ag 

2 Herbert E. Hawkes, E. F. Lindquist, and C. R. Mann, editors, The Consiruc gg 
and Use of Achievement Examinations. Houghton Mifflin Co. Boston, Eq >: 
р. 21-22. Б 
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United States history would have constantly decreasing validities for 
testing college students over a course in the same subject, for testing 
high-school pupils over a course in economic history of the United 
States, for predicting future success in a secondary-school English 
history course, for diagnosing weaknesses in abilities in United States 
history, for measuring general intelligence, and, finally, as а basis 
for assigning course marks in manual training. 

Validity is, therefore, a specific rather than a general criterion of 
a good examination. It is specific in the sense that a test may be 
highly valid for use in one situation and highly invalid for use in 
another manner. Tt is specific, also, in the sense that a test may be 
valid for use with one group of pupils but not for use with a differ- 
ent pupil group. Tests cannot correctly be described as valid in gen- 
eral terms, but only in connection with their intended use and at the 
intended ability level of the pupils. 

'There is a difference in the concept of validity which should be 
applied in the consideration of standardized and informal objective 
examinations. It is readily apparent that the teacher better than any- 
one else knows the content and emphases of the course he has taught. 
"Therefore, in that sense, he is the person best qualified to construct a 
valid test for his course. However, it is frequently true that the 
makers of standardized tests are better able than many, if not most, 
classroom teachers to determine what commonly are, and perhaps 
what should be, the content and emphases in courses for which they 
construct and standardize tests. Therefore, it seems reasonable to 
conclude that insofar as test content is concerned the teacher is the 
person best qualified to test the attainment of the desired outcomes 
in his particular class, but that the standardized test affords a su- 
perior means for determining how well his pupils have attained the 
core outcomes that are most widely recognized as being desirable 
in the particular course he is teaching. This difference in the appli- 
cation of the concept of validity is the result of the fact that no two 
teachers teach exactly the same course and that no one teacher 
teaches exactly the same course twice during his lifetime. Although 
this is particularly true for courses in the contemporary social studies 
and literature, and in the sciences, in which new content must be 
introduced constantly to keep abreast of developments, it is true 
even of such subjects as mathematics, in which the methods used 
and classroom problems that arise may well differ from semester to 
semester even though the basic content may be largely unchanged. 
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Three types of test validity are discussed here: (т) curricular 
validity, (2) statistical validity, and (3) psychological and logical 
validity. Of these three, the first is by far the most important, for 
in the final analysis any method of test validation must be based 
on relatively subjective judgments concerning the degree to which 
an examination covers the proper ground. Statistical validity, in 
turn, is a more widely used and probably a more important concept 
than psychological and logical validity. 


Curricular validity 


The first of the three types of methods used primarily in deter- 
mining the validity of educational tests is curricular validation. A 
teacher who carefully and thoughtfully selects a standardized test 
or constructs an informal objective examination or any other evalu- 
ation instrument for his class is attempting to insure curricular 
validity by making certain that the test deals with the types of 
educational outcomes he wishes to measure and is at the proper level 
of difficulty for his pupils. There are various sources of evidence to 
guide the teacher in considering test validity from the curricular 
standpoint. Among these are textbooks, courses of study, reports of 
national committees, and the writings of subject and test specialists. 
The idea in each case is that analysis of these source materials will 
furnish evidence concerning the thinking of qualified educators on 
questions dealing with course objectives and that such an analysis 
affords an important objective basis for determining the outcomes 
to test. 

Textbook and course of study analyses. The major weakness in 
the analysis of textbook and course of study content as a validation 
method is that it tends to perpetuate faulty and inadequate curricu- 
lar content if such defects exist in the source materials. It does not 
look beyond present practices. On the other hand, the overlapping 
of instructional material that is common to a large number of 
textbooks and courses of study almost certainly represents impor- 
tant content. 

Recommendations of committees and subject and. test. specialists. 
Reports of national committees and writings of subject and test 
specialists often serve as good guides to content in educational test 
construction. Such reports and recommendations are usually based 
on carefully formulated statements of instructional objectives and of 
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desired learning outcomes stated in terms of pupil behavior. These 
source materials often provide excellent foundations for standardized 
and informal objective test construction, and the use of modern 
reports and writings by recognized committees and specialists is un- 
likely to result in perpetuating the errors in past practices. 

Local determination of objectives and outcomes. Much emphasis 
in modern schools is properly placed on teacher participation in the 
formulation of instructional objectives and of resulting behavioral 
outcomes. Although there are some guiding principles and some core 
areas of instructional objectives and practices having wide applica- 
bility in all schools, there are also many significant differences in 
communities relatively close to each other geographically. These dif- 
ferences may justify varied patterns of objectives, of instructional 
materials and methods, and of behavioral outcomes in the pupils of 
any two such schools. After the teachers and other school officials 
have formulated both general and specific objectives and have iden- 
tified resulting outcomes, the natural next step is the selection or the 
construction of tests and techniques that will measure the degree to 
which pupils have attained the desired outcomes. In fact, the degree 
to which the general outcomes proposed are capable of objective 
evaluation may well be a significant criterion of their suitability. 

Great care must be exercised in the application of the last two of 
these three methods of curriculum validation of a test, fruitful as 
they are when appropriately used, if the test is to possess the ex- 
pected degree of validity. Instructional objectives have often been 
in the past, and sometimes are even today, stated in such vague and 
inexact terms that it becomes impossible to obtain a sufficiently 
precise understanding of their meaning for effective use in test selec- 
tion or abr Furthermore, instructional outcomes as en- 
visioned often in the past and even sometimes today may be stated 
in terms of outcomes expected or desired some years in the future 
rather than in the near or immediate future. Even the indirect meas- 
urement of such remote or ultimate outcomes presents a task for 
which there is at present no feasible method. Outcomes of an opera- 
tional and definite type, having meanings readily understandable 
by all, are most significant for the test selector or test constructor. 
Similarly, outcomes attainable in the near future, although con- 
ceived in terms of later realization of the ultimate objectives, seem 
most meaningful for the teacher in making or selecting tests. These 
points are developed more fully in Chapter 5 of this book but they 
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are mentioned here because of their importance in an approach to 
test validation. 


Statistical validity 


A second method of validating tests is by means of statistical 
techniques. Methods frequently used involve the determination of 
the correlation between test scores and such criteria as teachers' 
marks, ratings of expert judges, scores on other tests designed for 
the same type of use, and measures of success on certain types of 
future outcomes. Basic to this method is the belief that the test is 
valid if high correlations are obtained between scores on it and the 
criterion measures, and implied is the belief that the criterion meas- 
ures may be accepted as measurement standards. Correlation coeffi- 
cients obtained from the types of situations named above are called 
coefficients of validity. 

Correlation with school marks. The method of validation by cor- 
relation with school marks assumes that in the long run a test has 
validity if the pupils’ scores on it are closely related to their achieve- 
ment in the subject. That is, a test in language must have con- 
siderable validity if pupils whose school marks in the subject are 
consistently high make the superior scores on the tests and if pupils 
whose school marks in the course are low usually make the inferior 
scores on the test. In spite of the apparent unreliability of teachers’ 
marks for refined measurements, an educational test that consistently 
picks out the pupils who, in the teacher's judgment of a specific 
ability, are superior or inferior, probably does have significant 
validity. 

Correlation with ratings of expert judges. ThiMProcedure is re- 
lated in many respects to the one discussed above. To the extent 
that teachers’ marks are the judgments of experts, the two pro- 
cedures are identical. , 

Correlation with other known measures. This method may be 
utilized in fields in which extensive critical work in test develop- 
ment has already been done. There would be reason to doubt the 
validity of a factual achievement test in American history that did 
not show some relationship to achievement of knowledge outcomes 
as measured by other valid tests in this subject. This is particularly 
true in the content subjects as contrasted with the skill subjects. This 
method of test validation is most frequently used when an outstand- 
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ingly superior test is available to serve as the criterion. For example, 
the individual intelligence test constitutes the best basis at present 
for the validation of group intelligence tests. 

Correlation with measures of future outcomes. 'This method of 
validation is used primarily with prognostic and sometimes with 
aptitude tests. As the purpose of a prognostic test is to predict future 
outcomes, e.g., the success of ninth-grade pupils in a course in first- 
year algebra, the degree to which scores on the test are related to 
measures of the outcomes the test attempts to predict indicates the 
validity of the test. 

Another group of validation methods primarily statistical in na- 
ture but not involving correlation coefficients is based on differences 
in test scores made by pupils having different subject matter back- 
grounds or levels of maturity. The two such methods discussed below 
are used primarily by the maker of standardized tests. 

Accomplishment of widely spaced groups. One of the readily rec- 
ognized evidences of validity in test content is the power of such 
material to reveal significant differences in the accomplishment of 
widely spaced groups. For example, a performance test for use in the 
eighth-grade woodworking shop might be validated by administering 
it to groups of eighth-grade pupils who have had a semester of shop- 
work, and to similar eighth-grade pupils who have taken no indus- 
trial work in this field. If the test is valid in content, the differences 
in the scores made by the two groups should be significant. It is 
assumed, of course, that the pupils have actually learned something 
in the semester course in shopwork. This procedure is frequently 
used in the validation of aptitude tests and of other tests in which 
rather highly specialized skills are involved. 

Rise in percentage of success. 'This method is based on the changes 
that education and maturation bring about. A valid reading test is 
expected to show significant increases in scores indicative of in- 
creased achievement as the tests are used in successive school grades. 
If twelve-year-old children do not demonstrate a higher level of 
mental maturity than eleven-year-old children on the same test, there 
is reason to question the validity of the test of mental ability. 

Social utility. The validation of content in terms of social utility 
assumes that the course of study itself is based on that point of 
view. This procedure is distinctly in line with modern theory in 
curriculum construction. An example of this approach to spelling 
test construction is the use of words that exhaustive word counts 
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have shown to be most widely used in written language, and there- 
fore words that the pupils need most to be able to spell correctly - 
Also, home mechanics tests might be based in part on the skills, sa cH 
as fixing a leaking water tap, hanging a window weight, or wirin = 
a buzzer, that activity analyses have shown to be most frequen tly 
required in the maintenance of household equipment. 


Psychological and logical validity 


There are certain subjects in which it appears to be impossible to 
secure an objective or statistical basis of validation. In general, these 
subjects are in the complex fields made up of many interrelated ab113— 
ties as contrasted with those practical skill areas where the test e «i 
performance either is an exact representation of, or a very similar 
substitute for, the instructional outcome sought. Analysis both of 
the desired outcome and of the proposed test by psychological an d 
logical methods may well reveal a sufficient degree of commonality 
or of similarity to justify the belief that the test constitutes a valid 
measure of the outcome. Such methods are followed quite frequent 1x 
in such complex fields as language and the reading-study skills areas_ 


2  RELABILITY 


A test is said to be reliable when it functions consistently. T'Zz e 
reliability of an examination depends on the efficiency with whic дг 
a test measures what it does measure. This statement may appear 
on the surface to conflict with, or to repeat, the statement in the 
preceding section concerning the validity of an examination. Such is 
not the case, however. A test may satisfactorily test what it do es s 
test without to any effective degree testing what its user attempts 
to test. However, it cannot efficiently measure what it attempts ta 
measure unless it efficiently measures whatever it does measure. This 
is equivalent to the statement that a test may be reliable withotat 
being valid but that it cannot be valid unless it is reliable. Therefor = , 
reliability is really an aspect or a phase of validity. 

When a reliable test is used with the type of pupils and for the 
purpose for which it is intended, it will also be valid. This concep t 
is fundamentally a restatement of the fact brought out in the above 
section—that validity is specific and that it depends not only Ox4 
test content but also on the proper use of the test. Thus, reliability | 
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even though it is an aspect of validity, is general in nature. Reliabil- 
ity, in turn, has two aspects, adequacy and objectivity. These will 
be discussed later in this section, 

Reliability is most frequently expressed by the use of the coeffi- 
cient of correlation. In each of the four methods presented below 
for obtaining or estimating the reliability coefficient, it is the inter- 
nal consistency or self-consistency of the test that is being evaluated. 
Only the general methods of obtaining the coefficients and discus- 
sions of their applications are given here. The statistical procedures 
involved in obtaining the various coefficients are presented in Chap- 
ter 14. 

Reliability coefficient. The method of determining the reliability 
of a test is by means of correlating scores on two equivalent forms 
of the same test given successively by the same procedure to the 
same group of pupils. The resulting measure is called the coefficient 
of reliability. Thus, as is true of the validity coefficient, the reliability 
coefficient is simply a special application of the coefficient of corre- 
lation. Students interested in making a critical analysis of the relia- 
bilities of standardized tests should doubtless do so on the basis of 
the correspondence of scores on two forms of the test. The resultant 
coefficient is likely to be safe and to be free from the factors making 
for artificially high relationships that sometimes result from less 
critical methods. 

One method of estimating test reliability when two forms of the 
test are not available or cannot conveniently be given makes use of 
the retesting coefficient. This coefficient, which is also a special ap- 
plication of the coefficient of correlation, is sometimes used when 
only one form of a test is available. The test is given to the group 
of pupils twice under similar testing conditions and the retesting 
coefficient is the correlation coefficient between the two sets of scores. 
The second administration of the test should not too quickly follow 
the first, for a significant increase of scores may result from memory 
of the previous experience with the test, but neither should it be 
delayed until forgetting has operated to a high degree. In any event, 
some increase of scores will probably result from the practice effect. 
Lindquist pointed out that this method is in general unsatisfactory, 
especially for achievement tests, and that it results in a spuriously 
high coefficient.? 

3 E, F. Lindquist, A First Course in Statistics, Revised edition. Houghton Mifflin 
Co., Boston, 1942. p. 219-20. 
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A second method of estimating the reliability of a test is by means 
of the chance-half coefficient. The test is given to a group of pupils 
and their scores are then obtained for two arbitrarily determined 
halves of the test. Usual methods of dividing a test into chance- 
halves are: (1) obtaining separate scores on the odd-numbered and 
on the even-numbered items, or (2) obtaining separate scores on 
items т, 4, 5, 8, 9, 12, 13, etc., and on items 2, 3, 6, 7, ro, 11, etc., to 
equalize difficulty of the two half-scores when the items are in a 
scaled order of difficulty. The correlation coefficient obtained be- 
tween the two sets of scores indicates the degree of conformance 
between the two chance-halves of the test. The reliability coefficient 
which would be expected for a test as long as the two halves com- 
bined is then found by estimating the correlation by using the 
Spearman-Brown Prophecy Formula. 

This method of estimating test reliability has been popular in 
the past, since it involves a relatively small amount of labor and 
expense. Lindquist pointed out that the coefficients of reliability esti- 
mated by this method are less dependable than those obtained by 
correlating scores on two forms of a test and are also likely to be 
spuriously high.* Despite that fact, this is one of the most feasible 
methods for use with informal objective examinations for which 
ordinarily no second or alternate form is available. 

The third method of estimating test reliability furnishes a footrule 
coefficient which may in some cases be an underestimate but which 
is never an overestimate of the reliability coefficient. Called a “Foot- 
rule” coefficient because it admittedly is not the most accurate 
method, it requires the use of only three facts and measures from 
the test in a simple formula—the arithmetic mean and standard 
deviation of the scores and the number of items in the test.’ Because 
of its simplicity and because it furnishes a result of sufficient accu- 
racy for many purposes, this method is recommended for use by 
teachers in estimating the reliability of their informal objective 
examinations. The method of computing this coefficient is presented 
in Chapter 14. 

As has been suggested above, estimates of reliability coefficients 
often result in spuriously high or low statements of test reliability. 
The reliability coefficient itself must be based on a known and ap- 


4 Ibid. p. 218-19. 
5 С. F. Kuder and M. W. Richardson, “The Theory of the Estimation of Test 
Reliability," (Formula 21). Psychometrika, 2:151-60; September 1937. 
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propriate range of ages or grade placement of pupils if it is to mean 
what it purports to mean. Hence, the reliability coefficient is neither 
an entirely adequate device nor, for that matter, the only method 
of indicating the internal consistency of a test. 

Standard error of measurement, 'The other increasingly popular 
device by which test reliability can be estimated is the standard error 
of measurement. This standard error indicates the degree of accuracy 
existing in the test score obtained for each pupil on a test. Accuracy 
here does not relate to the type resulting from lack of errors in com- 
puting the scores but rather to the magnitude of sampling errors of 
the type discussed and illustrated in the following section of this 
chapter. Since the standard error of measurement is not affected by 
the range of talent of the pupil group on which it is based, as is the 
reliability coefficient, it is coming to be recognized as a more concrete 
way of indicating test reliability than is the reliability coefficient. 
Methods of obtaining this measure of reliability are developed in 
Chapter 14. 


Adequacy 


The careful test maker never assumes that the instrument he has 
constructed is capable of measuring all of the factual knowledges 
or skills that a pupil has acquired in a school course. There are too 
many by-products and incidental learnings to make this possible. 
Good teaching should never stress a certain restricted body of facts 
to the exclusion of all other knowledge. When not only factual knowl- 
edges and skills but also concepts, understandings, applications, and 
tastes and preferences are considered, all significant types of instruc- 
tional outcomes, the task of measuring all of the outcomes from any 
course, any instructional unit, or even any single class period be- 
comes hopeless. At best, a test is a sample of certain portions of the 
total behavior which the examiner considers vital to pupil mastery 
in the field. Just as a grain buyer samples a carload of wheat by 
taking samples from different places in the car and grading the 
samples in order to obtain a measure of quality for the whole carload, 
*a test constructor measures the educational attainments of pupils 
by constructing test items that represent widely the types of pupil 
outcomes expected and accepts the scores resulting from their use 
as representative of the pupils’ relative achievements for the entire 
area sampled by the test items. Adequacy is the degree to which a 


76 THE SECONDARY SCHOOL 


test samples sufficiently widely into the subject that the resulting 
scores are representative of relative total performance in the areas 
measured. 

The diagram of Figure 2 is used to show the effect of sampling | 
on the reliability of test exercises based on а certain limited field | 
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Fig. 2. The principle of sampling 
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of information. Each of the 4o rectangular spaces in the diagram 
represents an item of information Pupil A has had an opportunity 
to learn. The 28 shaded and the r2 unshaded rectangles represent 
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respectively the items he has and has not mastered. Thus, he has 
correct information on 28, or 70 per cent, of the до items. 

If a very limited sampling comprised of items 1 to 5 is taken at 
one end of the field of information, Pupil A will be equipped to 
answer correctly only item 5. However, if the last five items, 16 to - 
20, are selected for the test, he should be able to answer correctly 
all except item 18. Thus, through sampling alone there is a variation 
from 20 per cent to 80 per cent of correct responses. Again, if he is 
separately tested on one test composed of all of the even-numbered 
items, and another composed of all of the odd-numbered items, he 
should have right answers for five items and eight items respectively, 
giving him percentage scores of so and 8o. Finally, if he is tested on 
all twenty numbered items, he will answer thirteen correctly and 
consequently have a percentage score of 65. It is to be noted that as 
the number of items is increased the pupil's success on the test more 
nearly approaches the actual amount of his information in this field. 

The use of percentage scores in the above illustration is not to be 
taken as condoning or accepting a percentage basis of marking. The 
purpose here is to make clear that the extent of the sampling which 
the items in a test represent is an important factor in the accuracy 
of the resulting scores. If the sampling is small, the scores are likely 
to be unfair to some pupils; if the sampling is ample, the scores are 
likely to be fair to all pupils. 

The above illustration shows that the accuracy or consistency of 
test scores depends on the extent of the sampling. It was pointed out 
above that a reliable test must measure consistently. "Therefore, 
adequacy of sampling is essential to reliability in а test, and ade- 
quacy should be considered as a phase or aspect of reliability. 


Objectivity 

A test is objective when the scorer’s personal judgment does 
not affect the scoring. The need for elimination of the subjective 
factor in the marking of examinations was recognized early in the 
growth of the testing movement. This recognition was one of the 
major factors contributing to the development of the standardized 
and informal objective tests. Objectivity in a test makes for the 
elimination of the opinion, bias, or judgment of the person who 


scores it. 
In general, objective test items are so worded that only one answer 
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satisfies the requirements of the statement. The distinct advantage of 
selecting highly objective items for use in educational tests is that 
there can be little or no disagreement on what is the correct answer. 
This means that outside of purely chance errors there shouldybe no 
variation in the scores assigned to a given test by different persons 
or by the same person on different occasions. 

The effect of objective items on the accuracy of marks is shown 
by an analysis of the marks assigned by a group of ten teachers to 
two examinations in geography written by the same pupil over the 
same subject matter. One paper consisted of his answers to 1o essay 
questions, while the other gave his answers to 40 true-false items. 
Since each teacher marked the papers independently and at different 
times, there was little chance for the mark assigned previously to 
carry over. The range of scores shown in Table r for the essay ex- 
amination was from 76 to 9o, the average of the ten scores was 83, 
and the average amount by which the ten scores deviated from 83 
was three score points. On the other hand, one score of 30, one score 
of 32, and eight scores of 31 were assigned to the true-false test. 


TABLE І. Scores assigned by ten teachers to an essay and a true-false . 


examination over the same material in geography 


Scores Assigned 


"Teacher Essay-type True-false 


(то Questions) (40 Items) 


31 


А 
B 
C 
D 
E 
F 
G 
H 
I 

J 


Average ....,.. 


Average error 
(deviation from 
the average) .. 
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Thus, the average amount by which the scores deviated from the av- 
erage score of 31 was but two-tenths of a score point. The relative 
objectivity of the two types of examinations is shown definitely by 
these findings. 

Objectivity, as well as validity and reliability, of a test may be 
expressed by the use of the correlation coefficient. The coefficient ob- 
tained between scores or marks assigned to a group of papers by the 
same individual at two different times is sometimes called the coeffi- 
cient of objectivity. However, this coefficient is less widely used than 
are those for estimating validity and reliability, inasmuch as the fact 
is quite obvious that the best types of objective test items are rela- 
tively high in objectivity. 

What was pointed outyabove as being true for adequacy of sam- 
pling is also true for objectivity—both are essential to test reliability 
and both are therefore aspects or phases, although independent ones, 
of reliability. 


3 PRACTICALITY 


* A good examination must also possess certain characteristics of a 
quite practical nature. These characteristics of administrability, 
scorability, and economy have to do with ease of administering the 
test and scoring the results and with the requirements in labor and 
financial outlay entailed in using the examination effectively. 


Administrability 


While administrability is not one of the major criteria, it is never- 
theless one worthy of much practical consideration. Administrability 
is the characteristic of a test that is concerned with ease, clarity, and 
uniformity in its administration and in preparation for its admin- 
istration. 

Ease of administration must be evaluated from two distinct points 
of view—that of the test administrator and that of the pupils taking 
the test. Specifications should be complete and precise both for 
advance preparations and for actual test administration, Definite pro- 
visions should be made for the preparation, distribution, and collec- 
tion of test materials, for oral instructions by the examiner preceding, 
during, and at the end of the examination, for written directions to 
the pupils covering the test as a whole and for each separate part, 
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for timing of the test or test parts, and for a variety of other factors. 

Instructions to the pupils, whether they are written, oral, or both 
written and oral, should be simple, clear, and concise. Any unusual 
types of items or test elements of complex nature should be intro- 
duced by sample items and illustrated by practice exercises. The test 
format should be such that pupils will have no difficulties in reading 
the items, in recording their answers, in moving from one page to 
the next or from one part to the next, and in various other practical 
uses of the testing materials. Illustrations should be clear-cut and 
easily tied up with the appropriate test items. The page size, length 
of line, size and style of type, and other mechanical features should 
be such as to facilitate rather than hamper the administration of 
the test. 

The provision of direct and simple methods of recording, trans- 
lating, and interpreting the results of the test is another important 
aspect of ease of administration. Otherwise useful instruments some- 
times involve complex processes in turning the raw scores into prac- 
tical and useful forms. 

For a standardized test provisions of these types should be made 
in the test booklet and manual. For informal objective tests, per- 
formance tests, essay tests, and other evaluation procedures and tools 
great care should also be taken in standardizing the examination 
procedure by the advance preparation of specifications for admin- 
istration. Assurance that a test possesses the characteristics of ad- 
ministrability discussed above is best given by adequate printed or 
written specifications which are made available to, and are followed 
strictly by, the test administrator. 


Scorability 


The results of a test possessing scorability should be obtainable 
in as simple, rapid, and routine a manner as is commensurate with 
their importance. It is desirable that tests be subject to accurate 
scoring by clerical workers or other persons not conversant with 
their content. Various methods of facilitating the scoring of tests, 
and thereby increasing their scorability, have been devised. Among 
these methods, discussed in Chapter 5 of this volume, are the use of 
prepared keys, the use of separate answer sheets to be scored by 
hand, and the use of separate answer sheets to be scored by machine. 
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A convenient form of answer key or stencil should be provided 
for standardized tests, and the manual of directions should carry 
complete instructions for scoring the instrument. The scoring keys 
should be arranged so that easy and accurate scoring of the tests can 
be accomplished. Properly spaced answers on scoring keys for in- 
formal objective examinations can be prepared by filling in the cor- 
rect answers on a copy of the test and converting it into a set of 
strip keys, cutout stencils, or a combination of the two, according 
to the nature of the test parts. 


Economy 


, Economy is certainly not one of the major criteria of a good test, 
but it is a factor that must be given consideration. Real economy 
in testing will not be achieved by indiscriminate use of cheap tests 
or testing methods, but it is equally true that the most costly instru- 
ments and methods are not necessarily the best. In the last analysis, 
the economy of a testing program should be computed in terms of the 
validity of the tests per unit of cost. 

There are many devices by which costs of testing can be kept low 
without reducing the effectiveness of a measurement program. Infor- 
mal objective tests can be prepared by use of the mimeograph or 
gelatine plate, and some types may even be given by a blackboard 
method or orally, The economies of time made possible through the 
use of some of the scoring devices mentioned in a preceding section 
of this chapter result in real financial saving. Cooperative testing 
programs operating under institutional or public educational aus- 
pices in many of the states offer testing services to the schools at 
very low rates. Test booklets that are not necessarily destroyed by 
one use are now available for many standardized tests, whether 
machine-scoring or hand-scoring is used. Therefore, an effective test- 
ing program need not be dependent on great financial outlay. 


Д COMPARABILITY 


-A test possesses comparability when scores resulting from its use 
cax be interpreted in terms of a common base that has natural or 
accepted meaning. There are two means whereby comparability of 
results is established for standardized tests: (1) availability of 
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duplicate forms of the test, and (2) availability of adequate norms. 
Standardized tests should be accompanied in the test manual or 
elsewhere by adequate tables of norms adapted in type to the age 
and grade levels for which the test is intended and to the types of 
abilities it measures. By the use of such norms, individual pupils 
or class groups can be compared with average performance for pupils 
of similar age, of similar grade placement, or who are taking the 
same course. By the use of duplicate forms of a test, results from 
testing before and after a period of instruction can be made com- 
parable without the necessity of using the same test twice. 

Comparability of results can be established for informal objective 
tests by the simple statistical procedures presented in Chapter r3. 
In a sense, a series of duplicate forms is established when different 
class groups are tested over a period of several years, even though the 
tests used from year to year may overlap considerably in content. 
In a sense, also, norms can be statistically established on the basis 
of results from any but very small classes, although such norms do 
not possess the reliability and wide significance of norms for stand- 
ardized tests that are based on extensive pupil populations. 

The importance of comparability of test results is great, for with- 
out.comparability of measures some of the major values resulting 
from the use of tests are lost. 


5 umuy 


А test may possess adequately all of the important characteristics 
of a good test discussed above and yet be of relatively little value 
for use in a particular school situation. 4 test possesses utility to the 
degree that it satisfactorily serves a definite need in the situation in 
which it is used. Unless tests are selected or constructed for definitely 
conceived purposes and their results used in an intelligent attempt 
to bring about the desired results, they are of little value and may 
even, in fact, be harmful. The modern teacher has a definite purpose 
in mind when he administers tests and makes as effective use as 
possible of the results in the guidance of his pupils. 

If the test is standardized, simple illustrations of the methods of 
interpreting and using the results should be given in the manual. 
If the test is one which the teacher constructs, its utility depends 
largely upon the foresight of the teacher in so planning the test and 
its use that the results will serve the needs of the local classroom. 
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Utility may in a sense be considered a final master criterion. It is 


certainly not entirely distinct from the other criteria, but it may 


be 


an effective final check on the value of the test. 


Topics for Discussion 


Ба 


17. 


What is meant by the validity of an examination? Define or explain 
in several ways. Is it a general or a specific concept? 

Discuss and illustrate the two major methods by which validity is 
obtained in a test. What is the final or ultimate basis on which test 
validation depends? 

How does the concept of validity differ for standardized and in- 
formal objective tests? 

What cautions should be observed in the formulation of instruc- 
tional outcomes as a basis for test validation? 

Define or explain reliability as a criterion of a good examination. 
Is it a general or a specific concept? 

Briefly discuss the methods by which the reliability coefficient of 
a test can be obtained or estimated. Consider the relative merits 
of the several methods. 

Is a valid test necessarily reliable? Explain. Is a reliable test neces- 
sarily valid? Explain. 

Show how test adequacy is essential to reliability. How is adequacy 
assured? 

Show how objectivity contributes to reliability. What are the specific 
features of an objective test? 

By what means is administrability obtained in a test? Is this an 
important criterion of a good examination? 

How may scorability be obtained in an examination? 

What is the importance of economy as a criterion? 

What is meant by comparability as a criterion of a good examina- 
tion? What are the two major means of attaining comparability? 
Explain how norms or their equivalent are essential to an examination 


that possesses comparability. 
In what way is utility in a sense a master criterion of a good ex- 


amination? 

Review the criteria of a good examination and show why a good 
test must be properly balanced in all respects if it is to serve its 
purpose efficiently. 

How do the criteria of a good examination apply to other types of 
measuring instruments and techniques? 
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Constructing and Using 
Standardized Tests 


THE FOLLOWING PROBLEMS in the construction and use of standardized 
tests are considered in this chapter: 


Meaning of the standardization process. 

Controlling validity and difficulty of test items, 
Equating test forms. 

Deriving norms for standardized tests. 

Establishing final validity and reliability of tests. 

Using standardized tests in instruction, guidance, ѕирегуі- 
sion, and administration. 

Instructional uses of educational tests. 

Diagnosis and analysis as related to remedial instruction. 
Planning the testing program. 

Selecting and administering the tests. 

Securing and using the results in the classroom. 
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1 CONSTRUCTING STANDARDIZED TESTS 


A treatment of the problems of constructing and refining stand- 
ardized tests that would be sufficiently detailed to afford an adequate 
guide to the inexperienced worker who might have ambitions to con- 
Struct such a test would run far beyond the confines of a single 
chapter in this textbook. It would be a volume in itself. This chapter, 
however, should be adequate to make the student or the classroom 


teacher more critical of all types of standardized measuring instru- 
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ments, and at the same time more appreciative of the technical skill 
and expense in time and money required to produce a commercial 
educational test of a quality adequate to stand up under present-day 
criteria. 


Meaning of standardization 


Standardization, or the process of deriving comparative norms, 
is quite frequently designated as the single factor that distinguishes 
the formal standardized test from the informal objective test. How- 
ever, a program of standardization demands a more critical analysis 
of subject matter, a more careful formulation of exercise material, 
a more exacting refinement of the techniques of evaluating test items, 
more critical standards of equality of items and of test forms, and 
more rigid statistical analysis than are usual for the informal objec- 
tive test. Thus real differences in the two types of tests appear. It 
becomes clear that the mere derivation of a set of norms for a 
test does not in itself make the instrument a standardized test. The 
matter of securing norms that facilitate interpretation of the test 
results is undoubtedly the most important phase of test standardiza- 
tion, but it is only one of several important procedures that are 
closely related to the standardization process. 


Establishing validity of test content 


The maker of the standardized test faces the rather complicated 
problem of preparing an examination over content and for a group of 
pupils he has not specifically taught. To be reasonably certain that 
he is fair in the selection of items, he must take care to include only 
those aspects of the course that are very likely to receive instruc- 
tional emphasis. Naturally this requires at the outset the selection of 
test content that must be general enough to fit into any school 
situation in which the course is taught. 

The difficulties encountered in the selection of item content for 
the standardized test depend to a certain extent upon the nature of the 
skills, knowledges, concepts, understandings, or applications to be 
tested. If the field is one in which the objectives and outcomes are 
clean-cut and readily identified, the problem may be comparatively 
simple. In the case of arithmetic, a subject in which the fundamental 
facts and skills are well known, the selection of content suitable for 
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use in a standardized test is a relatively simple matter. Many tests of 
acceptable validity are available in this field. Exactly the reverse is 
true in certain content courses. The fields in which the instructional 
aims are specific or highly factual lend themselves readily to the con- 
Struction of standardized tests. Tests in fields in which the knowl- 
edges, skills, attitudes, and other outcomes are of a more indefinite 
nature are much more difficult to validate. 

In most subjects the validity of the test content is very difficult 
to establish by acceptable statistical or other objective means. In 
certain fields it is practically impossible. In one subject a certain 
validation procedure may be effective and acceptable; in another it 
тау be completely unsuited for use. Makers of standardized tests 
have resorted to many different types of validation procedures, some 
of which are discussed in a later chapter, 


Constructing and validating test items 


The discussion of the problems of constructing and using informal 
objective tests in Chapter 7 points out certain principles that are to 
be observed in the selection of the item content for any type of 
objective test, whether standardized or informal objective. Therefore, 
methods used by the makers of standardized tests in the selection of 
objective item types to use for each fact, principle, relationship, or 
outcome which they wish to test and in the actual construction of 
Various item types are discussed only briefly here. 

Test validity depends upon (т) the validity of the content in 
general and (2) the validity of the individual items of which the test 


pointed out in the following pages of this chapter. Objective evidence 


= 
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Objectivity. Objectivity in the test item is such an important 
element in the reliability of measurement which the test affords that 
it would be difficult to conceive of a standardized test made up of 
items not characterized by the quality known as objectivity. In 
general, objectivity is determined by the form in which the test item 
is stated. The builder of a standardized educational test faces the 
very difficult problem of determining the precise form of objective 
technique that best fits the subject he wishes to test. In most cases 
this is a problem that can be answered only by experimentation and 
as the result of experience. 

After the instructional areas to be covered by the standardized 
test have been determined, the test maker must proceed to analyze 
the subject matter into elements representing the basic concepts. 
These important elements may then be stated in some ‘objective 
form, the form selected depending to a certain degree upon the 
nature of the objectives and outcomes of the course and somewhat 
on the maturity level of the pupils with whom it is to be used. 
Frequently it is necessary to prepare certain of these basic concepts 
in two or three different objective forms, selecting for final use the 
types that perform best under experimental conditions. In addition 
to the item type to which the content to be tested is best adapted, 
there are three other factors that characterize the objective test item. 
These are briefly discussed below. 

т. Uniformity of response. Test items which otherwise appear to 
meet the requirements of objectivity frequently are so stated that 
they allow considerable variation in response. This weakness in the 
test items is more likely to be found in recall or completion items 
than in any other forms. Since items of these types are rarely used 
in standardized tests today, it may be sufficient to point out here 
that, other things being equal, items that set up conditions encourag- 
ing a multiplicity of answers should be eliminated’ or made more 
objective. 

2. Sparing use of clues and suggestions. One of the common 
criticisms of the objective test item is that it contains many sugges- 
tive elements or clues that the pupils soon learn to recognize as 
indicating a certain type of response. Unquestionably this is a 
significant criticism of many objective forms. In formulating objec- 
tive items, great care should be taken to see that undue suggestion of 
the truth or falsity of an item is not inherent in the form of the 
statement, although this is frequently very difficult to avoid. Some 
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experience in the making of tests of the alternate-response types 
indicates that the false or negative statements are more difficult to 
formulate and are more likely to contain recognizable clues. The 
Systematic use of such terms as always, never, no, not, as well as 
such prefixes as un- and in- in the negative forms of items may easily 
lead the pupil to spot them at once as clues. 

3. Freedom from ambiguity. The elimination of ambiguity, or 
the possibility of misinterpretation, is one of the most difficult prob- 
lems the test maker has to face. To keep ambiguity out of an item 
frequently means that he must simplify the statement. When the 
concept itself is simple, it is almost impossible to escape making the 
Statement so obvious that its validity as a test item is reduced. 
Another aspect of the problem of ambiguity in test items is the fact 
that there are certain items which to the ignorant or poorly informed 
pupil are perfectly straightforward and clear but which to the 
critical and well-informed pupil involve implications that cloud the 
issue. That is, the better the pupil is informed in the field represented 
by the item the more likely he is to be confused by it. This is a 
phase of ambiguity in the statement of items that is closely tied up 
with item difficulty. This point is discussed below as one of the major 
criteria for the selection of test items. 

Difficulty. The difficulty of a test item is usually expressed in terms 
of the number or percentage of pupils of a certain classification who 
respond to it correctly. The determination of the optimum difficulty 
of the test items to be used in a standardized test is a problem on 
which there is not complete agreement among test specialists. Some 
test authorities prefer approximately equal numbers of items at all 
levels from very easy to very difficult, while others prefer to use a 
few easy and a few difficult items but to have the majority near the 
5o per cent difficulty level. They are in general agreement, however, 
that the test as a whole should have about 50 per cent difficulty for 
the average pupil. 

The common practice in test construction is to attempt to prepare. 
items covering a wide range of difficulty, from very easy to very 
difficult. /tems are not suitable for inclusion in the test if they are 
so easy that no pupil of the type on which the tests are to be used 
fails to respond correctly. The presence of such items would merely 
serve to lengthen the test without adding to the reliability of its 
measurement. In a similar way, items that are so difficult that no 
pupil is able to respond correctly should not be included in the test. 
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Thus, items that lie at the extremes of difficulty, 1oo per cent failure 
and тоо per cent success, are useless, since no one is able to tell how 
far beyond these limits the difficulties may lie. An item that does not 
in a very direct way serve to differentiate between pupils at various 
levels of achievement has no place in an educational test, since it 
adds only useless dead weight. ч 

Modern practice in the arrangement of standardized test items 
tends to follow the procedure of presenting items covering a wide 
range of difficulty in ascending order from the very easy to the most 
difficult. This plan makes it possible for the lower-grade or less able 
pupils to respond to certain items within their level of mastery with- 
out being unduly discouraged by being confronted at the outset with 
exercises of prohibitive difficulty. On the other hand, it also causes 
the more able pupils to waste a certain amount of time working 
through a large number of items that are time-consuming but are 
not hard enough to bring out their real abilities. The allowance of 
liberal working periods for such tests tends to take care of this 
difficulty somewhat. Thus each pupil is allowed to work long enough 
to reach the level at which his abilities are taxed to the utmost. If 
the test items are carefully scaled in such a way that there is a 
gradual and continuous rise in difficulty, a relatively small amount of 
time is lost by the superior pupils in working on items that are far 
below their abilities. Similarly, the levels of ability of the less 
accomplished are revealed quite promptly and accurately. The prob- 
lems involved in the scaling or statistical evaluation of test items 
are rather technical and require a much more extensive treatment 
than can be justified in this ш) 

Discriminative power. 'The basic function of all educational meas- 
urement is to place individuals along a defined scale in accordance 
with differences in their achievement. Such a function implies high 
discriminative power on the part of the test. Since tests are made up 
of separate items, it is clear that each item comprising a test must 
have this quality in a maximum degree if the total test is to pos- 
sess it. 

Discriminative power in a test or a test item means that a different 
quality or magnitude of response may be expected from individuals 
or groups possessing the abilities in question in varying degree. 
Pupils with superior ability should answer the item correctly more 
often than should inferior pupils. This suggests a method by which 
the power of a test item to discriminate or distinguish between 
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groups of pupils may be determined. The practical implications of 
this procedure may be illustrated quite simply. 

An experimental test has been given to a class of тоо pupils having 
the normal range of ability in the subject. The tests have been 
corrected by the use of the answer key, and the score of each pupil 
in terms of the number of items answered correctly has been deter- 
mined. On the basis of these scores, the class of тоо pupils may be 
divided into three groups. The 27 per cent of the pupils making the 
highest scores constitute the superior group; the 27 per cent making 
the lowest scores comprise the inferior group. The 46 per cent of the 
class in the middle are not considered in computing this index of 
discrimination. The use of the 27 per cent comprising each of the 
extremes follows a proposal made by Kelley and further exploited 
by Flanagan? for this purpose. The next step involves an item 
count for all of the items in the test, showing the number and per- 
centage of pupils in the superior group compared with similar data 
for the inferior group. A summary of a brief sampling of items from 
a typical test is given in Table 2. 


TABLE 2. Discriminative power of test items in percentages of success 
by superior aiid inferior groups 


[ Superior | Inferior 
Group Group Index of 
Item | High 27% | Low 27% | Discrimination 
“= 

I 12 4 24 
12 6 4 .08 
23 8 14 —.13 
44 IO 18 —.15 
55 24 13 15 
76 42 22 23 
97 52 I2 46 
108 8o 36 .46 
129 go 86 08 
140 92. 40 50 


This table indicates that Item 1 was answered correctly by 12 per 
cent of the superior and by 4 per cent of the inferior pupils. This 


1 John C. Flanagan, “General Considerations in the Selection of Test Items and 
a Short Method of Estimating the Product-Moment Coefficient from the Data at 


the Tails of the Distributions.” Journal of Educational Psychology, 30:674-80; 
December 1939. | 
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item thus shows great difficulty and a limited power to discriminate 
between good and poor achievement. The fact that the item is 
answered correctly by such a small proportion of all pupils (average 
8 per cent) indicates that its difficulty is great. Item 44, however, 
is correctly answered by a smaller percentage of superior pupils 
than of inferior pupils. This is shown by the negative discrimination 
index of —.15. This shows that the item is at fault or the wrong 
facts have been taught in this subject. The item should probably be 
eliminated from the test. Items 97, 108, and 140, with positive 
indexes of .46, .46, and .59, are probably good enough to retain in the 
test. 

This method of determining the discriminative power of test 
items is widely used in the critical analysis of test items for stand- 
ardized tests. The classroom teacher who is interested in the experi- 
mental development and analysis of informal objective examinations 
will also find in the method illustrated a very satisfactory procedure 
for determining the quality of test items. 


Methods of equating test forms 


Two or more forms of an educational test are considered to be 
equated when practically identical scores on each are made by the 
same individuals or by individuals of the same ability. This means 
that the forms of the test must be made up of test items that 
parallel one another closely in difficulty. In practice, such close 
equality of item difficulty in alternate forms is obtained in one of 
three ways. 

т. The first procedure involves the preparation of large numbers 
of items covering the total range of the outcomes to be tested, on 
the chance that there will be a sufficient number of items at each of 
many difficulty levels to permit the pairing of items of equivalent 
difficulty in the alternate forms of the test. When this is done, the 
alternate forms of the test may be considered roughly equal in 
difficulty, but there will be only a very general and limited equiva- 
lence of content. 

2. The second procedure involves the preparation of parallel 
items on certain selected, important concepts. One item may test 
the identification of the concept, while the other may test the identi- 
fication of an additional phase of the concept or some phase of the 
identification of the procedure involved. 
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3. A third procedure that permits the establishment of compar- 
able forms of tests by the use of derived scores is mentionéd here, 
although the complexity of the statistical techniques necessary and 
the variety of derived scores which are used in this way make a 
complete presentation impracticable at this point. It may suffice 
here to say that the derived scores are so established that they have 
constant meanings, whether or not they are obtained on the same 
form of the test or from the same pupil group, and that the method 
of establishing a “normalized group" is basic to the procedure. 
Several of the most widely used derived scores that are in general 
based on this type of procedure are presented in Chapter r3. 


Deriving test norms 


· Norms provide the user of a standardized test with the basis for 
a practical interpretation and application of the results. Unless the 


Norms that accompany a test reflect an accurate picture of typical 
“accomplishment, they are useless and they render the test itself 


useless. 

Early in the history of objective testing, practically all tbat the 
development of a standardized test required was to give a few test 
exercises to a hundred or more pupils in different school systems. 
The results were then compiled and submitted as norms. The stand- 
ardized test differed from a reasonably good informal objective test 
mainly in the fact that the former had been tried out with more 
pupils in a larger number of different classes. In fact, many informal 
examinations of the objective type meet all criteria of standard tests 
except that of having norms for the evaluation of their scores. How- 
ever, test standardization as it is now interpreted means much more 
than the mere derivation of norms, although the existence of norms 
is still the most distinctive feature of the standardized test. 

Norms are tables of information necessary for the interpretation 
of test scores and are obtained by giving the particular test to a 
large and representative sampling of pupils in the same grades and 
of a type similar to the groups with which teachers will use the 
tests. To the extent that the sampling used in obtaining the norms 
was distributed over a large population in typical school situations 
and that the conditions under which the tests are to be administered 
are rigidly followed by the teachers using the tests, the norms furnish 
a reliable and useful basis for interpretation. 
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Types of norms. 'The form in which the norms for a test are pro- 
vided depends to a large degree upon the level in the school system 
at which the test is used. The norms are also conditioned somewhat 
by the nature of the test itself. Tests that are designed for use in 
the elementary-school grades are usually accompanied by age norms 
and grade norms and sometimes percentile norms based on grade 
placement. Tests intended for use in the secondary school are more 
frequently provided with percentile and grade norms only. Age 
norms do not seem to be particularly useful at the high-school and 
college levels, since so many factors other than age operate to 
affect achievement. Then, too, the curve of growth appears to 
flatten out quite rapidly after the sixteenth or seventeenth year, so 
that the increments of growth in achievement from age to age at the 
upper levels are relatively small. 

' Brief discussions of grade norms, age norms, percentile grade or 
subject norms, and percentile norms for schools are given below. The 
brief illustration of how such norms are derived indicates roughly 
the procedure used by test makers in establishing norms for a test. 

т. Grade norms. The grade norms established for most of the 
commonly used achievement tests are based on the median scores 
obtained by giving the tests to large groups of pupils within each 
grade. Such norms provide a reasonably practical basis for the 
interpretation of class scores as well as of individual accomplish- 
ment. In the derivation of grade norms for standardized tests it is.a 
common but not universal practice to express the norms in terms 
of end-of-the-year achievement. In any event, it is desirable to have 
the norms clearly indicate the period they are designed to cover. 
Grade norms thus provide a convenient means of expressing the 
approximate progress of the pupil through the grades by turning 
his raw score or standard score into a grade-equivalent score. For 
example, if the seventh-grade end-of-the-year norm for a certain 
test were 120 points, and the eighth-grade end-of-the-year norm 140 
points, a score of 130 points would be treated as representing achieve- 
ment halfway through the eighth grade, or an 8.5 grade equivalent. 

In many of the modern analytical tests composed of several parts, 
raw scores frequently are changed into standard scores before the 
grade norms are established. The data in Table 3, which is an 
abbreviated table of grade norms taken from the manual for the 
Iowa Language Abilities Test, show the grade equivalents corre- 
sponding to standard scores for each subtest and the median standard 
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scores for this test. Raw scores on each part of this test are changed 
directly into standard scores as the scoring of each part is completed. 
The total score on all parts of the test is represented by the median 
of the several standard scores. In this table a standard score of 160 
on Test r, Spelling, represents a grade equivalent of 8.7, while a 
similar score on Test 3, Language Usage, is assigned a grade equiva- 
lent of 8.3. A median standard score of тбо for the entire test gives 
the pupil an 8.4 grade equivalent. 

2. Age norms. At the elementary-school level, age norms of one 
type or another appear to provide a more adequate basis for the 
interpretation of individual pupil accomplishment than is possible 
with grade norms or percentile grade norms alone. The: problem of 
establishing age norms has been complicated by a number of factors 
arising out of the generally inadequate child accounting practices in 
the schools and the indifferent attention to the significance of 
ageness as a factor in school progress and accomplishment. In its 
simplest form the preparation of age norms involves the regrouping 
of all pupils used in the grade tabulation into chronological age 
Broups regardless of grade location or school progress. The test 
scores of these chronological age groups are then tabulated, and the 
means or medians computed. These results are then used as the 
basis for setting up tables of the scores corresponding to the several 
age groups. It is readily apparent, however, that many factors other 
than age are operating to influence the average achievement of pupils 
grouped in grades. Such factors as over-ageness, retardation, and a 
serious lack of balance between retardation and acceleration are all 
present. It is found, for example, that while the average chronological 
age of a seventh-grade pupil might be 13 years and 6 months at the 
end of the school year, the average test score of pupils of 13 years 
and 6 months is not at all the same as the end-of-the-year score for 
the seventh grade. The actual achievement of the under-age pupils 
is significantly superior to that of over-age groups in a given grade: 
For the makers of standardized tests the useful implication from this 
fact is that it makes very apparent the need for norms that will take 
into account wide differences in maturity, mental ability, or school 
progress within the grade. While many reputable standardized tests 
are accompanied by age norms determined without regard to the 
grade level at which the accomplishment stakes. place, it appears 
obvious that interpretations of individual pupil accomplishment on 
the basis of these norms are likely to be misleading. 
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TABLE 3. Grade equivalents corresponding to each subtest standard 
score and the median standard score for the lowa Language 
Abilities Test ? 
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182 13.1 182 
181 12.8 1st 
180 12.5 13.0 180 
179 12.3 12.6 13.0 179 
178 12.0 12.3 12.6 178 
177 11.8 12.0 12.2 177 
176 11.6 11.7 11.8 13.0 13.0 -- . 176 
175 11.3 11.4 11.5 12.2 12.6 12.6 175 
174 Кї Th 12.8 11.2 11.4 12.1 12.0 174 
173 II.0 10.9 11.6 10.9 10.9 11.7 11.5 173 
172 10.8 10.7 ILI 10.6 10.5 12.7 IL4 11.1 172 
171 10.6 10.5 10.7 10.4 10.2 11.9 ILI 10.7 171 
170 10.4 10.3 10.4 10.2 9.9 11.4 10.8 10.4 170 
169 10.2 10.1 10.1 9.9 9.7 10.9 10.5 10.1 169 
168 10.0 9.9 9.8 9.7 9.5 10.5 10.2 9.9 168: 
107 9.9 9.7 9.6 9.5 9.3 10.1 10.0 9.7 107 
166 9. 9.5 9.4 9.3 9.1 9.8 9.8 9.5 160 
165 9.5 9.3 9.2 9.2 8.9 9.6 9.6 9.3 165 
164 9.4 9.1 9.0 9.0 8.7 9.4 9.4 9.1 164 
163 9.2 9.0 8.8 8.9 8.6 9.2 9.2 8.9 163 
162 9.0 8.8 8.6 8.7 8.4 9.0 9.0 8.7 162 
161 8.9 8.6 8.4 8.6 8.3 8.8 8.8 8.6 161 
160 8.7 8.5 8.3 84 8.1 8.6 8.6 8.4 160 
159 8.6 8.3 8.1 8.3 8.0 8.5 8.4 8.3 159 
158 8.4 8.2 8.0 8.1 7.8 8.3 8.3 LE 158 
157 8.3 8.0 78 8.0 7:7 8.2 B.1 8.0 157 
156 8.1 7.9 T 7.9 7.6 8.0 7.9 7.8 156 
155 8.0 78 TS 73 7.5 79 78 77 155 
154 7.8 7.6 7.4 7.6 7.3 7.7 7.6 7.6 151 
153 77 7.5 72 7:5 72 76 7$ 74 125 
152 7.5 73 74 7.3 71 7.5 7.3 73 + 152 
151 7.4 7.2 7.0 7.2 7.0 7.4 TIE 7.2 151 
150 7.3 74 6.8 LB: 69 7.2 7.0 71 150 
149 71 7.0 6.7 ae 6.8 71 6.8 6.9 149 
148 7.0 6.8 6.6 6. 6.7 7.0 6.7 6.8 148 
147 6.9 6.7 6.5 6.7 6.6 6.9 6.5 6.7 147 
146 6.7 6.6 6.4 6.6 6.4 6.8 6.4 6.6 146 
145 6.6 6.4 6.2 6.4 6.5 6.7 6.3 6.5 145 
144 6.5 6.3 6.1 6.3 6.2 6.6 б.т 64 144 
143 6.4 6.2 6.0 6.2 6.1 6.4 6.0 6.2 143 
142 6.2 6.1 5.9 6.0 6.0 6.3 58 б.т 142 
141 6.1 6.0 5.8 59 5.9 6.2 5.7 6.0 141 
140 6.0 5.9 5.6 5.8 5.8 6.1 5.6 5.9 140 
139 5.9 5.8 5-5 5.7 5.7 6.0 54 5.8 139 
138 5.8 5.7 54 5.6 5.6 59 5.3 5.7 138 
137 5.6 5.5 5.3 5.4 5.5 5.8 5.2 5.6 137 
136 5.5 54 5.2 5.3 54 5.7 50 5.5 136 
135 5.4 5.3 5л 52 5.3 5.6 49 54 135 


2 H. A. Greene and H. L. Ballenger, Directions for Administering: Iowa Language 
Abilities Test, Intermediate. World Book Co., Yonkers, N. Y., 1948. Table то. 
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| 3. Age-at-grade norms. In the ordinary process of test standardiza- 
tion the establishment of age-at-grade norms involves a number of 
difficulties having to do with (1) availability of pupil population and 
(2) statistical procedures resulting from inadequate population 
groups. While the number of sixth-grade pupils who are between ten 
and eleven years of age would represent a large portion of the normal 
Sixth-grade population, the number of under-age individuals in the 
sixth grade who would be nine, eight, or seven years old, and the 
number of over-age pupils of eleven, twelve, thirteen, or fourteen 
years of age would fall off very rapidly. Thus, in order to secure 
teliable age-at-grade norms for all ages within the grade very large 
numbers of individuals must be tested in order to secure adequate 
populations in the fringe areas. Otherwise, estimations by extrapola- 
tion must be made. While not all of the assumptions on which such 
estimations are based can be definitely established in practice, it is 
probable.that the injustice done by such estimations is much less 
Serious than would be the failure to provide such differential norms 
in the first place. 

| An example of norms given in both age and grade equivalents 
within the grade is shown in Table 4 for Test C, Language, of the 
Towa Basic Skills Tests, Advanced. In the interpretation of this test, 
raw point scores are turned directly into grade equivalents. Thus in 
this table the first digit of the two-place numbers in the grade 
columns are to read as the grade location, and the second digit as 
the proportion of the pupil's progress toward the next grade. For 
example, the typical second-semester fifth-grade pupil with a chrono- 
logical age of 13 years and o months has a grade equivalent expect- 
ancy of 43, or 4.3, according to this table. Tt is possible to determine 
quickly from this table the grade-equivalent score expectancy for the 
various age groups. 

It is interesting to note here that as almost invariably happens the 
younger pupils within the grade make the higher grade-equivalent 
Scores up to a certain point, and the older pupils make the lower 
scores. For example, the table shows that a first-semester sixth-grade 
pupil who is 13 years and 6 months old has a grade-equivalent score 
of 4.8. On the other hand a child in the same grade but chronologi- 
cally three years younger (то years, 6 months) has a grade equivalent 
of 6.5. While the typical thirteen-and-a-half-year-old in the second 
semester of the sixth grade has a grade equivalent of 5.3, à typical 
child of.that.age Would have a grade equivalent of то.т if he were 


STANDARDIZED TESTS 99 


in the second semester of the ninth grade. It must be apparent that 
age-at-grade norms offer very useful means for the interpretation of 
individual pupil accomplishment, especially in the case of pupils 
who for one reason or another may be over-age or under-age for the 
grade. 


TABLE 4. Age-at-grade norms for the total language score of the 
lowa Basic Skills Tests ? 


Grade Equivalents 
for 
Grade and Semester (Beginning) 

Age 
Үт. Мо. 5-2 6-1 6-2 7-1 7-2 8-1 8-2 9—1 9-2 
17 © 69 
16 9 3b 70 
16 6 JEA 72 
16 3 ТА 
ТЫ ЫЕ 156. 57295 
15 9 67 74 77 
15 6 62 68 76 80 
x5 "3 63 69 78 86 
15 0 57 64 71 79 89 
14 9 . 58 65 74 83 94 
14 6 54 58 66 74 89 98 
14 3 56 бо 68 76 92 тог 
14 O 51 56 62 70 83 93 102 
13 9 52 57 64 73 87 94 102 
i3 6 .. 48 53 59 66 78 88 94 тог 
1345 48 53 бі 68 83 89 96 
13 о 43 48 54 62 73 83 90 94 
12 9 44 49 56 65 77 84 90 
12 6 44 52 58 70 78 85 89 
12 3 45 52 60 73 79 86 
12 0 47 53 65 73 79 84 
її 9 47 57 67 74 80 
ii 6 48 бт 68 44 77 
Ir 3 52 64 69 74 
Ir о 56 64 68 73 
10. 9 57 64 68 
10 6 58 63 67 
10 3 57 63 
10 0 57 62 
9 9 57 
9 6 56 


4. Percentile grade or subject norms. Relative accomplishment of 
individual pupils within a grade or who are taking a certain course 
may also be shown very clearly by turning raw or standard scores 


3 Manual of General Information: Iowa Every-Pupil Tests of Basic Skills, Ad- 
vanced, Houghton Mifflin Co., Boston, 1947. 4 
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into percentile scores. This is done by computing the percentile 
values from the frequency tables for each grade or course distribu- 
tion and assigning percentile equivalents for each score. Percentile 
norm tables show for a wide sampling of pupils in a certain grade 
or course (т) the percentage of pupils exceeding each score or each 
of a number of equally spaced scores, or (2) the score below which 
certain percentages of pupils fall. Although percentile norms are 
customarily presented in one or the other of these methods, there 
is a great variety in the form of such tables. Percentile scores cor- 
responding to specific raw or standard scores may be reported by 
grades, by test parts and totals, or only the raw score or standard 
score equivalents for specified percentiles, quartiles, and deciles may 
be shown in more compact tables. Whenever percentile grade norms 
are provided it is recommended that these values be used in the 
interpretation of individual pupil scores rather than the grade equiva- 
lents from the usual grade norms. The overlapping of distributions 
of scores is so great from grade to grade on most standardized ele- 
mentary-school tests, and the differences between successive grade 
medians are often so slight, that grade equivalents may exaggerate 
differences and lead to unsound interpretations. A sixth-grade pupil 
assigned a grade equivalent of 8.5 does not belong in the eighth 
grade. It may be much more accurate to use percentile scores to 
describe his accomplishment as superior in relation to other sixth- 
grade pupils. 

5. Percentile norms for school averages. In large city school sys- 
tems and in schools participating in state testing programs it fre- 
quently becomes desirable to interpret the results in terms of school 
averages. А comparison of one school with another in the same city 
or one system with another is not possible through the use of the 
norms based on individual pupil scores due to the fact that the 
variability of individual scores is so much greater than the variability 
of school averages. Percentile norms for school averages are obtained 
in much the same manner as other percentile norms are derived, 
except that averages are substituted for individual Scores in the 
grade distributions. 

Table 5 illustrates and gives the percentile norms by grades for 
the school and building averages obtained for total scores on Test C 
Language, of the /owa Basic Skills Test, Advanced. 

The values 53.5, 66.0, and 75.6 given for grades 3, 4, and 5 as the 
99th percentiles are all to be read as grade equivalents with the 
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TABLE 5. Percentile norms for school averages on the total language 
score of the lowa Basic Skills Tests * 


%- Grade Equivalents 
ile 


3 4 5 6 7 8 9 


99 53.5 | 66.0 | 756 | оо | 99.8 | 103.0 | 107.0 


95 | 49.5 | 60.8 | 69.6 | 84.2 | 95.7 | 100.8] 105.5 
90 47.0 | 58.0 | 66.0 | 80.0 | 91.3 98.4 | 104.5 


85 | 45.5 | 56.5 | 64.5 | 77.2 | 88.3 | 964| 103.2 
80 44.3 | 552 РЕ 75.6 | 86.5 04.8 | 101.6 
75 | 43.4 | 54-2 | 62.6 | 744 | 851 | 93.3 | 99.7 
то | 426 | 53:2 | 61.8 | 734 | 83.7 | 91.9) 974 


65 419 | 52.3 | бо | 72.4 | 82.5 90.6] 95.6 


бо 41.3 | 51.4 | бо.2 | 71.4 | 81.4 | 893| 94.2 
55 40.6 | 50.5 | 59.3 | 70.5 | 80.2 88.0} 92.8 
50 доо | 49.6 | 584 | 69.5 | 79.1 86.8} 91.6 
45 | 393 | 48.8 | 57.5 | 68.6 | 779 | 85.5) 90.2 
40 38.6 | 47.8 | 56.6 | 67.6 | 76.7 843| 88.7 
35 | 379 | 46.8 | 55:6 | 66.5 | 756 | 831) 87.1 
зо | 37.2 | 45.9] 54.5 | 65.3 | 744 | 819| 853 
28 | 364 | 44.9 | 53:5 | 64.1 | 73.2 | 80.6) 83.3 
20 35.5 439. 524 | 62.8 718. 79.2 |. 814 
15 34.4 | 42.8 | 51.0 | 61.2 | 70.2 1777| 794 


10 | 33.1 | 41.6 | 49.4 | 59.2 | 682 | 759) 77:1 
5 31.6 | 400 | 47.2 56.3 65.0 733| 744 
28.5 | 36.0 | 43.3 | 51.8 | 590 67.0| 69.0 


І 


decimal points moved one place to the left. Тће second decimal place 
is reported here due to its special significance in these school 
averages. 

It should be noted that the type of percentile norms illustrated in 
Table 5 is designed for use in the interpretation of school results and 
should not be used in interpreting individual pupil scores. To confuse 


4 Adapted from a series of tables in the Manual of General Information: Iowa 
Every-Pupil Tests of Basic Skills, Advanced. Houghton Mifflin Co., Boston, 1947. 
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these norms for school averages with percentile grade norms used 
in interpreting individual pupil achievement would introduce a 
serious error in test interpretation. 


Norms vs. standards 


The use of the term stgndardized in the discussion of tests of the 
type for which norms are provided has led to the development of a 
careless tendency to use the words “standards” and “norms” as 
synonyms. The process of securing the data for the critical analysis 
of tests and the derivation of suitable norms is properly known as 
standardizing. However, the term “standard,” when used to refer to 
a level of pupil achievement, implies an ultimate goal to be achieved. 
These standards may not actually be reached by any individual, but 
they are levels of achievement toward which to strive. Norms are 
the levels of achievement which typical pupils actually attain. When 
considered in the light of these definitions, it is clear that there are 
few tests which are accompanied by standards. It might be more 
nearly the truth to call the process of securing these comparative 
scores known as “norms” by the more descriptive name of *normaliz- 
ing.” 

Possibly one of the best illustrations of the differences between 
standards and norms is to be found in the field of arithmetical com- 
putations. The standard of arithmetical accuracy is naturally тоо 
per cent, for most such computations containing error are useless. 
However, the actual norm of arithmetical accuracy of computation 
on a well-known test is from 65 to уо per cent for the junior high- 
school grades. That is to say, pupils of these grades work these 
particular kinds of arithmetical examples with an accuracy of from 
65 to 70 per cent instead of the desired ultimate goal of тоо per cent 
accuracy. i 

It should be recognized that a norm does not necessarily represent 
a satisfactory level of achievement. This is particularly true of 
schools in which instruction and classroom environments are superior 
and in which pupils, largely because of their satisfactory home 
environments and heredity, have superior abilities. In any event, 
teachers should encourage pupils to make the most of their abilities 
and to surpass test norms whenever they can. Even when a class 
has average performance that is just at the norm on a test, repre- 
senting only the attainment of what is expected from a typical class, 
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approximately half of the pupils will still be below the norm of 
achievement. 

Standards are of two general types. In the first place, there are 
certain standards of achievement, or minimum essentials, which 
have been fairly generally accepted by school people for such 
abilities as handwriting and, in less objective form, reading, spelling, 
and arithmetic. Although these standards are usually based on the 
results of standardized testing, and may make use of norm tables 
for their establishment, they frequently are conceived of as represent- 
ing the minimum quality and perhaps speed of performance that 
will adequately equip the pupil for post-school life. For example, 
the widely accepted standard in handwriting is a quality of бо on the 
Ayres’ Scale for Measuring the Handwriting of School Children at 
the rate of 7o letters per minute. Although the quality of 6o at the 
given rate on this scale is approximately the norm for pupils com- 
pleting the sixth grade, it is also thought of as the standard or 
minimum ability that should be attained by all pupils before they 
finish school. 

In the second place, the standard in any school subject or form 
of pupil achievement may be a definitely formulated, although prob- 
ably subjective, or even only a vaguely conceived, idea in the mind 
of the teacher or principal concerning his expectations of his pupils. 
In this sense, standards are extremely variable and differ from 
school to school, from teacher to teacher, and even, as his ideas and 
pupil groups change, from year to year for the same teacher. 

The modern emphasis upon providing for each child as an indi- 
vidual the type of instruction best adapted to his abilities, interests, 
and present and future needs, rather than upon the molding of all 
pupils into the same achievement pattern, has reduced the reliance 
of school people upon standards. The attempt is rather to furnish 
maximum aid to each child in the development of his potentialities 
and to evaluate his achievement in terms of himself as an individual. 


Establishing final validity and reliability 


The procedures discussed above, although sometimes complex and 
always time-consuming, are prerequisite to the final steps in the pub- 
lication of a standardized test. After these steps have been carried out, 
the final forms of the test given to a representative group of pupils, 
and the norms derived on the basis of their scores, it remains for 
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the test maker to obtain final evidence concerning the validity and 
reliability of the test and the reliability of the norms. Although 
careful and accurate work on the preliminary steps should make 
reasonably certain that these important criteria will be satisfied, it is 
nevertheless essential that these steps be performed as a final check 
and that their results be reported to users and prospective users of 
the test to enable them better to evaluate it. 

The interlocking and complex nature of some of these final steps 
makes necessary only a brief presentation here of the most im- 
portant aspects of this final checkup. For the beginning student, 
Chapter 14 presents an adequately comprehensive treatment of the 
methods of determining the validity and reliability of tests. 

Validity of the test. ТЇ the test is one for which validity coefficients 
of one or more of the types discussed in Chapter 4 will be meaning- 
ful, such validity coefficients should be obtained. In some instances 
this might require the administration of some other test to the 
group of pupils on which the test is standardized and the comparison 
of results obtained. In other cases it may require a comparison of 
test scores with course marks or teachers? ratings of pupils. Evidence 
concerning validity is also found in test norms that consistently 
show higher average scores with advancement in age and grade 
placement of the standardization group of pupils. In any event, 
evidence must be obtained directly or indirectly to show that the test 
measures what it purports to measure. 

Reliability of the test. Test reliability must be established for the 
final form or forms of the instrument. Techniques such as those 
presented in Chapter 14 and even some more refined methods might 
be used. The reliability of measurement, which gives an indication 
of the accuracy of the scores obtained on the test, should also be 
determined. The purpose of such procedures is to establish the fact 
that the test measures accurately and consistently. 

Reliability of the norms. One of the major problems in the deriva- 
tion of norms for standardized tests has to do with the reliability 
of the norms themselves. Possibly the statement of this problem 
would be made clearer if the word universality were substituted for 
reliability in the foregoing statement. Reliability implies consistency, 
but universality reflects the generally representative nature of the 
norms. An otherwise excellently made test may be limited in its 
usefulness through the fact that the norms are not sufficiently repre- 
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sentative. It may be that it is hopeless to expect to produce norms 
which are so generalized that they represent suitable bases of com- 
parisons wherever they may be found or used. However, the only 
hope lies in one of two directions. One is to sample so widely in the 
possible areas of population likely to use the test that practically 
every type and character of pupil and school situation is included. 
The other is to recognize the practical difficulties in the way of mak- 
ing a general norm fit all types of situations, and to select the popula- 
tion used in the derivation of the norms to represent deliberately 
chosen types of school situations. It would be impractical to expect 
pupils from small school systems with little or no laboratory or shop 
equipment to achieve at a level comparable to that expected of pupils 
from schools with large, well-equipped laboratories. The solution of 
this problem may lie in the establishment of representative norms 
for different types of courses and schools, 


2 PRACTICAL USES OF STANDARDIZED TESTS 


The value of the educational test is directly proportional to the 
extent to which the results from its use are translated into improved 
instructional, guidance, and administrative practices in the school. 
If these practices bring about improvement in the conditions under 
which teachers teach and children learn, the primary functions of 
school administration and supervision will have been realized. While 
the problems of securing these results in the most effective and 
economical manner are treated in this chapter primarily in terms of 
the test as an instructional device, the guidance, supervisory, and 
administrative uses of these instruments must not be slighted. 


Instructional uses of standardized tests 


The value of standardized tests for guidance, supervisory, and 
administrative, and research purposes has been emphasized so gener- 
ally that very often the classroom teacher overlooks their real value 
in the solution of his own instructional problems. Yet this is where 
the most vital and important uses of such tests are to be found. The 
development of reliable, valid, and highly detailed measuring instru- 
ments has caused the teacher to modify his previous conceptions of 
the uses of standardized tests. Earlier experience with the more 
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formal types of educational tests sometimes led the teacher to feel 
that tests were merely time-consuming devices used for checking up 
on his teaching efficiency, from which he received little or no con- 
structive help in the improvement of his instruction. Quite in contrast 
with this idea, the more modern conception of standardized tests im- 
plies their continuous use as instruction progresses. This means a con- 
tinuous testing program, for experience with the other conception of 
the use of tests indicates that only through continuous testing will 
standardized tests ever come to function at their highest efficiency 
as instructional instruments in the classroom. 

Class analysis and diagnosis. Very often a teacher, at the beginning 
of a school term, wishes to obtain advance information concerning 
the proficiency of his classes in certain subjects and their general 
preparation for the work. It is essential for him to know their weak- 
nesses and their strengths in some detail in order so to direct their 
work that the best results will be obtained. He needs to know the 
background his pupils have been given for the work they will be 
expected to master during the ensuing year. Most modern standard- 
ized tests permit this type of use. It is not always necessary for a 
teacher to employ a special diagnostic test to secure the required 
general data on the relative abilities of the class. For example, the 
ability to read silently is the basis of proficiency in so many subjects 
that the teacher should certainly secure a picture of the reading 
ability of the class. The results should indicate whether the class as 
a whole or the individual members of the class are able to interpret 
the printed page with facility, and so carry on their work without 
great assistance. It is also possible to test in a like manner for other 
general qualities, as well as for mastery of specific knowledge out- 
comes. 

Not only is this preliminary general diagnosis of great value to 
the teacher, but it has also been found desirable and valuable to 
check progress or advancement from time to time by means of 
objective tests. Tests of achievement will reveal whether the class 
as a group is moving together, or whether there are more or less 
well-defined sub-groups that seem to need special attention. Fre- 
quently such conditions furnish a justifiable basis for dividing a 
grade or class into sections for such corrective treatment. Where 
classes are divided into several sections, as is often done in larger 
schools, many competent educators feel that the pupils should be 
arranged so that groups of approximately equal ability are placed 
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together. The objective test is the best means for making this adjust- 
ment so. that pupils can move forward in groups of nearly equal 
proficiency. 

An illustration of this need is given in Table 6, which shows the 
rather startling range of ability found in a typical ninth-grade class for 
results from the Terman-McNemar Test of Mental Ability. A teacher 
confronted with a class ranging in mental ability from above twelfth 
grade to below seventh grade is faced by a hopeless task if he 
attempts to bring all members of this class up to the same level of 
proficiency. Particularly is this true when the range of ability is 
wholly unsuspected or measured, as it is in so many cases, by guess 
rather than by reliable tests. The systematic use of tests for the 
purpose of identifying class needs constitutes a real source of pro- 
fessional protection to the classroom teacher. 


TABLE 6. Distribution of mental ability in a ninth-grade class in terms of 
average grade placement 


Grade Location Number of Pupils 
Above 12 6 
12 4 
II 12 
то 8 
9 19 
8 8 
7 6 
Below 7 4 
Тоїа1 Number of Cases 6 
un 7 


Individual pupil diagnosis. Closely connected with the use of the 
test for pupil guidance is its use for the determination of the diffi- 
culties and variabilities of each individual pupil. While in general 
individual differences may not be so marked as to preclude reason- 
ably efficient class instruction, the more that is known about each 
child’s weaknesses and strengths the greater are the possibilities for 
success on the part of the teacher instructing the group. The test 
results should be studied especially in the light of each pupil’s indi- 
vidual attainments and points of difficulty. The critical analysis of 
each pupil’s test scores may very likely be a means of clearing up 
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wholly unsuspected troubles that would otherwise continue to ham- 
per the child and to reduce his chances for proper advancement. 
Although this type of individual analysis has great possibilities, it 
becomes increasingly valuable when it is definitely tied up with 
remedial material so devised that each child may be aided in cor- 
recting his own weaknesses. 

Examples of two diagnostic profile charts are given in the accom- 
panying illustrations of interpretative materials provided with two 
achievement tests of the survey type. Both profile charts serve 
analytic or general diagnostic rather than specific diagnostic func- 
tions, but it should be remembered that survey tests can be diag- 
nostic only in the broad sense discussed in Chapter 3. The profile 
chart for the California Reading Test furnishes places for recording 
graphically evidence of a pupil's grade placement on total reading, 
on the two sub-total measures of vocabulary and comprehension, and 
on seven part scores. The chart for the Gray-Votaw-Rogers General 
Achievement Test furnishes positions for the graphic recording of 
the pupil's grade placement on total achievement and on achieve- 
ment in ten separate major areas of elementary-school achievement. 


Fig. 3. Diagnostic profile chart for the California Reading Test ® 


s ‚5 Ernest W. Tiegs and Willis W. Clark, California Reading Test iate. 
California Test Bureau, Los Angeles, 1950. | Кы уш 
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Guidance uses of educational tests 


Schools are under constant criticism for their apparent failure to 
identify the special abilities of their pupils and to challenge these 
children to greater efforts. This is one aspect of educational guidance. 
Furthermore, it is charged that little or no attempt is made to 
direct children away from fields in which they apparently have little 
aptitude. With the modern objective devices now available for the 
measurement of general as well as specific abilities of children, 
neither of these situations needs to exist. Teachers, principals, and 
administrators have found that test records obtained early in the 
institution's contact with the pupil prove to be extremely valuable 
aids in handling disciplinary cases and in helping pupils to adjust 
themselves in many other ways. Most disciplinary problems arise 
through the failure of the school system properly to stimulate and 
occupy the pupil's mind. Many of the reasons for such difficulties 
тау be made clear to the teacher by the wise use of properly selected 
tests. The necessary adjustments can then be made to correct a 
situation that need not exist if properly handled. 

In the modern school system, the guidance center serves also as a 
testing bureau. It must work in closest coordination with the adminis- 
trative office in the maintenance of an efficient child accounting 
system. Among the most important items entered in the records will 
be all types of readily interpreted results from physical and mental 
tests, aptitude and prognostic tests, personality inventories, and 
survey, analytic, and diagnostic achievement tests. The guidance 
specialist must therefore necessarily be well trained in the use and 
interpretation of measurement and evaluative tools and techniques. 


Administrative uses of standardized tests 


During the early period of the growth and development of educa- 
tional tests school administrators were the ones most directly con- 
cerned with their possibilities. Schools and classes were increasing 
in size, curricular offerings were expanding, educational costs were 
advancing, public interest in educational efficiency was increasing, 
and parents and teachers were growing more and more critical of 
the methods of evaluating pupil accomplishment and the marking 
systems then in use. Administrators themselves were becoming in- 
creasingly critical of some of their own practices. Students of educa- 


110 THE SECONDARY SCHOOL 


Thi ў Ses This 
Child's | Elem. | Lon- |Litera-| Spelt- see Sec. he, Arithmetic Total | Child's 
Educ, | Sci. |очоде | ture | ing |Vocob.| Comp.| Stu. fety | Reos. [Compu| Aver. | Educ, — Sch. 
peateos БОЕ ar aaa бе Мек ri БЕ ЕЙ end acer Grode Grade 
5 


ч обу: 90 


Y 
1 
1 
i 
85 р 
M 
MOE _ jMemk for Tod T of T 8th Tarodd F T ST 0] 
75 for T End T. of. T 7th i 75 


от for T End T of Т 6th "Grad 


0 70 


иккини ил очот оу от оч >4 ч чч чое өө өө өө өөө ө ою БӘӘШ = 
нуч dO QOO P win S O IER O GO IIR Or m m DS DII АЕ БЧА] 


2 
10:9 
10- 7 
10- 6 
19:3 
10: 2 [Norni о h 

3-1 

10- 0 a 

9-11 5 

3-10 0 46 

9-9 rd 

$7 et 

9-6 АЗ 

$5 5 

9.4 41 
9-3 

9:2 End T ot = 

9-1 49 

5 39 

0 S зг 

i 3 

3 

3s 

35 

т; 

34 

3 

H 

0 or T End T of. B 

25 

29 

28 

27 

27 


Fig. 4. Individual educational chart for the Gray-Votaw-Rogers 
General Achievement Test ° 


5 Hob Gray, David F. Votaw, and J. Lloyd Rogers, General Achievement Tests, 
Intermediate, Steck Co., Austin, Texas, 1950. 


STANDARDIZED TESTS 111 


tion and taxpayers asked searching and frequently embarrassing 
questions. Communities demanded school surveys as a means of 
answering their own questions concerning the efficiency of their 
schools. Naturally during this period the administrator turned to 
the test as one objective means of handling his problems of public 
relations. * 

Modern school administration demands that the most objective 
and reliable evaluative instruments available be used to provide the 
answers to the problems continually arising in connection with the op- 
eration of a school system. Often it is impossible to separate the 
administrative use of a test from an instructional or a supervisory 
use. The standardized test may be required to establish the adequacy 
of a given system of assigning and reporting teachers’ marks. A 
specific unit of instructional material or a new and unproved method 
of teaching may require experimental evaluation. The efficiency of 
the school system in terms of pupil growth per unit cost may require 
demonstration. These and other possible administrative uses of 
standardized tests by the administrator are listed in Section 3 of 
this chapter on planning the testing program. Only three adminis- 
trative uses are discussed here. 

Pupil gradation and placement. Administrators, supervisors, and 
teachers find the problem of pupil placement one of the most difficult 
situations they have to face. The indefinite lines of division between 
the grades and the wide overlapping of ability between grades reveal 
that the typical techniques of pupil classification now in use are 
extremely crude. This could scarcely be otherwise in view of the 
methods commonly used. The proper grade placement of pupils 
implies that insofar as possible individuals who are normal for their 
group should be placed together. This means that pupils who are 
approximately alike in their chronological age, their educational 
achievement, and their physiological, mental, social, and moral devel- 
opment should, where possible, be placed together for instructional 
purposes. Not all of these qualities lend themselves readily to objec- 
tive measurement, but a number of them do, and within these limits 
the results of objective measurements should be used in determining 
the pupil’s placement in his group. Through the development of 
reliable grade and age norms, based upon the achievement of groups 
of children on standardized tests, a valuable instrument for the 
establishment of grade lines and for within-class grouping is made 


available. 
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By means of a simple procedure, results from several different 
tests of a battery can be combined into a graphic record and used 
as a valuable aid in pupil classification. An example of this procedure 
is given in Figure 4 on page 110. The technique is simple and valu- 
able, regardless of whether a complete reclassification of the school 
or grade is planned, or merely the proper placement of a few new 
pupils entering the system for the first time. 

Group comparisons. Since the earliest beginnings of group instruc- 
tion, classroom teachers have wished to know just how their pupils 
have compared in attainment with other similar pupils and classes. 
Until standard tests were developed, it was practically impossible 
to secure this information. Now the giving of standardized tests in 
arithmetic, spelling, reading, or other school subjects makes fairly 
easy a comparison of the results from a class with the norms 
established for the subject and grade. 

Comparisons with other classes within the system in which the 
teacher is working, within the same building, and even between 
different sections of classes in charge of the same or different teachers 
can be made on a basis of objective norms that have been derived for 
the various tests. Another comparison that is even more useful is 
that between the attainment of a class at the beginning and the end 
of a semester's or a year's work, or at shorter intervals in the course 
of a semester. Each of these comparisons has its own peculiar value 
in assisting the teacher to determine the relative attainment and 
progress of his class at a given time. 

Measuring the efficiency of learning. Such general comparisons as 
are cited above are of great value in themselves, but equally im- 
portant is the determination of ways and means by which the act 
of teaching itself may be improved. Ambitious teachers everywhere 
are looking for the best methods of instruction in their fields. Teach- 
ing methods, which in the last analysis should be studied in the 
classroom by the classroom teacher, can be evaluated effectively by 
means of standardized tests. Instructional units within the course of 
study should also be evaluated. The measurement of the effect of 
certain types of drill exercises and the determination of the specific 
strengths or weaknesses of groups or classes illustrate the uncounted 
opportunities for the administrative use of these valuable instruc- 
tional devices. 
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Meaning and importance of educational diagnosis 


Educational diagnosis implies the use of more or less technical 
procedures designed to locate specific learning and instructional 
difficulties, and if possible to determine their causes. For the medical 
expert, diagnosis means the careful and extensive observation of the 
patient under controlled conditions. It includes the use of profes- 
sional instruments, such as the clinical thermometer, the stethoscope, 
and the microscope, which make possible exact and objective observa- 
tions. It means the assembly of a complete case history of the back- 
ground of the difficulty leading up to the present physical crisis. It 
is based on the examination and analysis of many similar cases, in 
order that common factors may be identified. For the teacher, 
diagnosis has many of the same implications, but unfortunately 
much of the exactness, objectivity, and precision of the medical 
diagnostician’s instruments appear to be missing in the teacher's 
equipment. Even today only a few objective measuring instruments 
capable of rendering reasonably precise diagnosis are available to the 
pedagogical diagnostician. The well-prepared modern teacher now 
has at hand reasonably adequate statistical techniques; analytical 
diagnostic tests in different subjects ; diagnostic charts; instruments 
and devices for measuring aural acuity, visual acuity, eyedness, 
muscular imbalance in the eyes, binocular vision, and binocular 
fusion; and many other highly important qualities that may account 
for a pupil’s lack of progress in many fields of learning. It is thus 
apparent that diagnosis in education is moving rapidly in the direc- 
tion of scientific accuracy. 

The diagnosis of difficulties underlying educational accomplish- 
ments undoubtedly constitutes the high point in the supervisory and 
instructional uses of educational tests. Deficiencies of a general 
nature are revealed and brought to light by general survey tests. 
Specific weaknesses, and to a certain extent causes of weaknesses, are 
identified by the use of properly selected diagnostic tests. Practically 
all of the more exact types of diagnostic procedures, such as the 
location of defects in speech, hearing, and vision, are dependent 
upon educational test results for their initial steps. These points will 
be discussed much more thoroughly in the later chapters on reading 
and language. 

Analysis as the basis of diagnosis. The successful development of 
the many sets of habits that constitute the bulk of school learning 
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depends upon the care with which the underlying and basic skills of 
the subjects themselves are recognized and utilized in the teaching. 
If it can be shown that teaching a child to add consists not only in 
developing the habit of responding automatically and correctly to the 
тоо basic combinations but also involves higher levels of skill, such 
as knowledge of the higher decade addition facts, bridging of the 
tens, control of the attention span, and carrying from one column 
to the next, the teacher's task is made obvious and objective. Simi- 
larly, if it can be shown that silent reading comprehension is not a 
single isolated ability but a composite of many elements, such as 
knowledge of word meanings, ability to get meaning from sentences, 
ability to arrange thought units and sentence units into logically 
organized wholes, and ability to find desired material quickly, the 
teacher has a real basis for his instructional procedures. Language 
is another basic subject in which many delicately balanced skills are 
interwoven in an extremely complex manner. Here again the ele- 
ments of achievement in the total process must be identified. Blind 
trust in general practice on the total skill must necessarily give way 
to the exact identification and discovery of the particular points of 
pupil weakness as a basis for special emphasis. 

Good diagnosis must parallel the processes of good teaching. Effec- 
tive diagnostic materials in any school subject can be prepared only 
after the skills contributing to success in this field have been isolated 
and identified. Psychologically, the reason for this is that on the 
whole the child learns to do what he practices and not something 
else. Remedial work, accordingly, can function only when the point 
at which pupil mastery breaks down has been located. Thus the 
analysis must be penetrating and the diagnosis must be precise. 

Specific nature of diagnosis. Diagnosis must be more exact than 
broad statements of general functions. It is not enough to discover 
that a child is unable to read silently. The exact nature of his handi- 
cap must be revealed before it is possible to undertake a remedial 
program. 'The more specific the diagnostic information revealed, the 
more exactly the remedial material can be made to fit the need. To 
return to a frequently used illustration, it is found by diagnosis that 
the child is unable to add, but unless the exact point at which his 
mastery of addition breaks down can be determined by the diagnosis 
teaching or remedial efforts are largely wasted. One of the outstand- 
ing reasons why more effective teaching and remedial work has not 
been done in certain fields is that no adequate analysis of basic skills 
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can be made or has been made. Concrete illustrations of this need 
are given in connection with a related discussion in this chapter. 

Importance of the diagnostic use of test results. Tests as süch are 
incapable of improving instruction because of any inherent power. 
Existing conditions are merely revealed by them, and these with the 
limitations implied by the validity and reliability of the particular 
instruments used. Remedial or corrective teaching is the result of 
deliberate constructive effort by the teacher after the particular 
points of weakness in the instruction of the pupils have been revealed 
by the tests. The ease, clearness, and directness with which these 
needs are revealed by the tests are a measure of their real educational 
value. Too few existing tests are so constructed as to permit the 
interpretation of their results directly in terms of an effective reme- 
dial procedure. However, this seems to be no good reason for the 
failure of teachers to apply more directly the results of this work in 
testing to the improvement of their teaching practice. Just as the 
data revealed by the navigator's instruments require calculation and 
interpretation, so is it necessary to analyze test data carefully in 
order to make them the basis of a genuine remedial program. 

'The interpretation of test scores and the planning of remedial 
procedures are the most difficult parts of the use of standardized edu- 
cational test results. Moreover, they are by far the most important 
parts. One of the greatest needs in education today is the provision 
for genuine diagnostic testing in all instructional fields, supple- 
mented by valid remedial work designed to correct the weaknesses 
and defects of individual pupils as revealed by the tests. It is im- 
portant to learn, as a result of using tests in the classroom, that a 
pupil or the entire class is below the norm in the subject, but unless 
it is learned with some exactness what causes the low level of achieve- 
ment the testing program will do little if anything more than supply 
interesting information. Teachers and supervisors have a right to 
expect that something more constructive will be provided in exchange 
for the time required for classroom testing. 

Diagnosis the basis for remedial work. Accurate diagnosis of class 
and individual pupil difficulties, coupled with application of specific 
remedy, is the heart of enlightened use of exact methods of teaching. 
The success of the remedial or corrective teaching depends upon the 
accuracy and detail with which the specific skills involved in success- 
ful achievement in the subject are identified and isolated in the test. 
Tests of the general survey type, or tests that report single unanalyzed 
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scores, cannot supply this information in sufficient detail. Specific 
examples of these points will be found in later chapters dealing with 
special subject tests. 

Diagnosis as the basis for preventive work. An examination of the 
number and types of skills identified as a result of the diagnostic 
methods discussed in the preceding section leads to a suggestion of a 
still more constructive use of analytic and diagnostic test results. 
Diagnosis as applied in education has taken on a meaning indicative 
of a breakdown in method, a failure of instructional techniques to 
function. Unquestionably, one of the basic purposes of diagnosis is 
the location of weaknesses and the determination of their causes, 
but there is nothing in the method that precludes its use in the pre- 
vention of weaknesses through anticipation of their causes. Out of 
the knowledge gained through the use of diagnostic procedures 
should come the basis for preventive work of all types. It is quite 
noticeable that the major emphasis in the fields of dentistry and 
medicine is not on correction but on prevention. The existence of a 
weakness implies a failure at some point in the program. The dis- 
covery of it should not be marked as important merely because it is 
then possible to correct it. The real importance in the discovery 
should lie rather in the prevention of its reappearance elsewhere 
under similar conditions. 

Another illustration from the field of medicine may make this 
point somewhat more concrete. In every medical examination for 
diagnostic purposes, a complete analysis is made and an exact case 
record of all observations is kept. Out of the analysis of these records 
has come a better understanding of the causes and characteristics of 
certain types of human ailments. Out of this same type of analysis 
has also come the basis for much of the preventive work that char- 
acterizes modern medical science. In a similar way, accurate and de- 
tailed educational diagnosis may ultimately offer the basis for the 
development of a program of preventive work in education. For ex- 
ample, if, after diagnosing the addition of fractions in the fifth grade, 
it is found that the failure of pupils to reduce fractions in the an- 
swers is a common weakness, the obvious thing to do is to correct the 
defects at once and then proceed to reconstruct the first instruction 
so that the following year the causes for this particular weakness 
may not operate so powerfully. Similarly, any weakness identified 
now should айога the basis for decisions calculated to reduce the 
probability of their recurrence in the future. 


STANDARDIZED TESTS 117 


The place of remedial instruction 


General practice exercises vs. remedial drill. There are in general 
two ways of maintaining a high level of pupil achievement in any 
subject after direct instruction has been discontinued. These are (т) 
broad, general drill with no integral units of testing to discover break- 
downs in pupil mastery, and (2) systematic remedial drill devices 
to fight forgetting, plus diagnostic testing to discover the exact causes 
of weaknesses when such weaknesses begin to cause poor work on re- 
view drills. The first method involves the systematic use of properly 
distributed general practice over the complete function. The second 
involves the periodic location of the specific defects of each pupil by 
means of diagnostic tests and the immediate correction of these de- 
fects by the use of properly constructed remedial drill. 

Unquestionably the latter is the more economical method of main- 
taining mastery of desired skills on the part of a pupil. It is obvious 
that general review is valuable at times, but just to review with no 
specific idea of what the review is to accomplish is too naive and 
hopeful to be effective. The program that coincides most closely with 
the experience of successful teachers and with a sound psychology 
of learning calls for the following steps in approximately the order 
indicated: (1): teach, (2) review, (3) test for weaknesses whenever 
they appear, and (4) follow wit remedial drill units on the specific 
weaknesses revealed by the tests. It may be worth while to note that 
material so constructed as to be effective for remedial purposes is 
also sound to use for initial instruction. In fact, the chief distinction 
between good content for initial teaching purposes and remedial drill 
purposes lies in when they are to be used. The most effective remedial 
drill for the pupil who does not have an adequate sight-meaning vo- 
cabulary for silent reading purposes is drill on the vocabulary he 
should have learned in the first place. 

Necessity for valid drill for each identified skill. If remedial work 
is to be effective, drills of established validity must be provided for 
each specific skill which conditions achievement in the subject. The 
validity of drill material depends to a large degree upon the accuracy 
and completeness with which the analysis of skills is made. Difficulties 
in subject units which can be identified in only a vague manner can- 
not be remedied except by chance. Drills must closely parallel the 
skills they are supposed to remedy. If mastery of a certain minimal 
vocabulary is essential to effective silent reading comprehension, then 
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drill on those particular words that constitute special weakness should 
take precedence over other drill. 

Theoretically, perfect validity of drill material can be achieved only 
by taking a roo per cent sampling of all of the possible basic facts 
or skills in the particular field. Naturally this is impossible in cer- 
tain cases, but it is nevertheless often possible to take such a large 
sampling that all of the most frequently used and most important 
facts are included. Subject fields vary widely in the ways in which 
they lend themselves to sampling of this kind. In fields such as read- 
ing or language, a complete sampling is almost impossible to obtain. 
On the other hand, many of the basic facts in arithmetic are so 
readily identified that they may be sampled roo per cent without 
difficulty. 

Properly designed remedial and corrective drill material wastes no 
time on skills that need no practice, but strikes directly at the heart 
of the trouble. Remedial drills in which careful control is kept over 
the distribution of practice on the basic skills are almost certain to be 
more effective than random exercises, even assuming in both cases 
that suitable motivation for improvement is provided. That drill will 
be most productive which most nearly provides a complete coverage 
of the skills of basic importance in the hierarchy of habits upon which 
successful achievement in the subject depends. Poorly organized 
drills may or may not deal with all possible weaknesses, but they are 
almost certain to waste time on skills that are not in need of drill. The 
validity of the drill depends upon the degree to which this sampling 
covers the basic or fundamental skills and the degree to which the 
exercises themselves actually develop the skills they purport to de- 
velop. There are a number of places in which this complex chain may 
break. The task of diagnostic and remedial treatment is to locate and 
repair quickly those links of the chain that have snapped under stress, 
or have rusted out through lack of use. Correctly designed remedial 
material will not only parallel valid drill on the correct skills, but it 
will also cover all of the basic aspects of the skill. Furthermore, it 
should acquaint the child with the most important variants of each 
situation. 

Effective remedial material must not only cover in a valid manner 
all of the basic or underlying skills upon which achievement in the 
field depends, but it must provide a means for bringing about a grad- 
ual union of these component elements into the total function. It is 
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entirely possible that a mastery of the subsidiary skills involved 
might result in only a partial control of the end product, if that goal 
were not reached by the gradual bringing together of each distinct 
skill in its relation to the whole process. 


3 PLANNING TESTING PROGRAMS 


Steps in a testing program 


The following brief outline is presented here as a suggestion to the 
teacher, supervisor, or administrator for the general organization of 
the testing program : 


т. Select and state a clear-cut teaching problem in the solution of which 
test results appear to be essential. 

2. Secure the cooperation of the school staff in the attack on the 
problem. 

3. Determine what types of test data will be valuable in the solution of 

the problem. 

Select the best available tests for the purpose. 

Make careful preparation and then administer the tests. 

Score the tests as quickly, accurately, and economically as possible. 

Tabulate the scores and analyze and interpret the results. 

Use the results and the interpretations in the elimination or improve- 

ment of the conditions revealed, depending upon the nature of the 

problem. 


SWANS 


Clear-cut teaching problem as basis. One of the most common 
errors made by teachers and supervisors is the inauguration of a test- 
ing program without first formulating a clear-cut problem the solution 
of which can be most advantageously reached through the use of tests. 
The problem should be sharply defined, for the testing program will 
thus be more limited in extent and more intensive. The best construc- 
tive supervisory work will result from careful and intensive cultiva- 
tion of a limited field. If the work is undertaken in this way, much 
time will be saved and one of the most common criticisms—that the 
time of the pupil and teacher is taken for the testing and nothing ever 
comes of it—will be avoided. Both teacher and pupils have a right 
to profit from a knowledge of the conditions revealed by the testing. 

Illustrations of problems. Problems suitable to form the basis for 
a testing program are to be found in almost all the fields of educa- 
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tion. Frequently these problems overlap. It is not uncommon for one 
test or a series of closely allied tests to contribute to the solution of 
several problems. The problems listed below are classified in accord- 
ance with their interest to teachers, administrators, and supervisors. 
It is clear, of course, that the list is not exhaustive. 


A. PROBLEMS PRIMARILY OF INTEREST TO TEACHERS 
AND SUPERVISORS 


1. The discovery and diagnosis of defects of individual pupils in the 
various subjects or in particular phases of a subject as the basis for 
a remedial program. 

2. The determination of how the achievement of the pupils and the 
class compares with the norms in the different subjects. 

3. The determination of the progress or growth of the class in the 
different school subjects over a given period. 

4. The determination of whether different phases of subjects are being 
properly or unduly stressed, as indicated by relative accomplish- 
ments of the pupils. 

5. The determination of the extent to which the pupils are working at 
maximum capacity. 

6. The evaluation of the effectiveness of a given organization of in- 
structional material. 

7. The evaluation of the efficiency of a given method of instruction. 

8. The experimental evaluation of textbooks. 

« 
B. PROBLEMS PRIMARILY OF INTEREST TO ADMINISTRATIVE 
AND SuPERVISORY OFFICERS 


т. The determination of the misplacement of pupils in grades or 
sections. 

2. The proper classification of new pupils entering the school system. 

. The division of classes into sections according to ability. 

. The selection of pupils for special classes, such as classes for ex- 
ceptionally bright or exceptionally dull pupils or for pupils having 
special defects in certain subjects. 

5. The determination of the efficiency of the school as a whole by 
comparison of scores with norms and with scores made by other 
schools or grades. 

6. The determination of whether the proper emphasis is given to all 
subjects or whether some subjects are overstressed. 

7. The comparison of different methods of instruction or comparison 
of new methods with the ones already in use. 

8. The determination of the general achievement level of a grade, a 
school, or a system. 


Ao 
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9. The measurement of the progress of a grade, a school, or a system 
for a semester, a year, or any given period. 

то. The determination of whether or not a grade, a school, or a system 
is achieving what can fairly be expected in terms of current edu. 


cational costs. 
тї. The evaluation of the effect of a special supervisory drive. 
12. The compilation and use of educational and vocational guidance 


information. 
13. The provision of answers to current local queries concerning the 


over-all efficiency of the school system. 


When to give tests. The type of testing program followed depends 
somewhat upon the purposes the tests are to serve and the nature of 
the tests selected. If the tests used are survey tests of general achieve- 
ment, they are usually given early in the school term and then perhaps 
again a few days before the end of the school term. This procedure 
permits the teacher to determine the improvement his pupils have 
made during this period. Tests that are used definitely for survey pur- 
poses and are given only once during the school year are usually ad- 
ministered at or near the end of the year. This is probably one of the 
least important times for tests to be given, since almost the entire 
school year is gone and there is little opportunity for the teacher to 
attempt to do anything about the conditions revealed by the tests. 7f 
only a single cross-section of the school ds taken, this should un- 
doubtedly come early enough in the school year to permit the teacher 
to profit from the findings. The periodic use of educational tests to 
measure class or individual pupil improvement is by far the most 
profitable practice. 

A further refinement of the idea of giving tests early in the school 
year is found in their use immediately following the completion of 
instruction on a particular course unit. Unit achievement tests, each 
designed to measure a specific area of the course, are proving popular 
for this purpose with both teachers and pupils in certain subjects. 
By using these narrow-function tests immediately after the comple- 
tion of the teaching of a specific instructional unit the teacher secures 
immediate information about the weaknesses of his class. Special in- 
adequacies of instruction are thus made clear, and he can proceed at 
once to set up a remedial program before the class has moved on to 
other activities. This suggests the continuous use of tests as the basis 
for remedial. work. The information provided by the use of these 
numerous narrow-function tests is also valuable in organizing future 
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instruction in such manner as to prevent the appearance of such 
weaknesses. 

Cooperative testing programs. During the past fifteen years, co- 
operative testing programs have developed in many cities and states 
for the purpose of providing a coordinated attack upon measurement 
and evaluation problems. These programs are very different in or- 
ganization, sponsorship, and objectives, but they typically provide 
testing services of such a nature that the participation of most 
teachers is limited either to the administration of the tests and use 
of results or alone to the use of results. 

1. City testing bureaus. Bureaus of testing and measurement in a 
number of the larger cities maintain staffs of measurement and re- 
search specialists whose primary functions are to carry on planned 
testing programs and also to conduct related research studies. Fre- 
quently the cooperation of teachers is-obtained in the administration 
of tests, and the results are made available to them for use with their 
pupils. Programs are frequently planned in cycles of several years, 
and tests in line with the total program may be given annually, twice 
a year, or at more frequent intervals. 

2. State-wide testing programs. Testing and related services are 
now available to the schools of approximately three-fourths of the 
states through some public educational agency in each state.* Several 
of the states have two such programs. Frequently the testing pro- 
grams are based on cooperative construction, administration, and 
Scoring of the tests and uniform methods of reporting results in 
comparable form. In other cases available standardized tests are 
cooperatively administered and scored and the results are reported 
in as uniform a manner as possible. State-wide norms are frequently 
provided. New forms of tests are constructed or provided annually 
in some state-wide setups. 

These programs and services vary widely among the different 
states, and include various patterns of achievement, intelligence, and 
personality tests. Some of the programs are conducted as scholarship 
contests, some are cooperatively sponsored by collegiate institutions 
that make use of scholastic aptitude test results of high-school sen- 
lors, some are conducted primarily for purposes of supervision, and 
still others are administered purely as services to the schools. Schools 


7 David Segel, State Testing and Evaluation Programs, U. S. Office of Education, 
Circular No. 320. Federal Security Agency, Washington, D. C., 1951. 
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in some states participate in the programs on a cost basis, and par- 
ticipation is most often optional for each school. 

3. Nation-wide testing programs. Cooperative testing on a nation- 
wide basis is offered by various educational foundations, cooperative 
services, and commercial agencies for a wide variety of educational 
and mental tests. The services are sometimes provided primarily for 
a particular group of schools and in other cases are provided for any 
school wishing to obtain them. Reports are furnished to participating 
schools, and norms are often prepared on regional and nation-wide 
samplings of pupil test results. 


A SELECTING TESTS 


Test selection depends upon the type of testing program planned, 
for tests should be chosen that not only are within the proper subject 
field and at the appropriate level of advancement for the pupils but 
that also will serve the desired function. It should be pointed out that 
not all tests are appropriately named, however, and that too much 
dependence can easily be placed upon a test title. Accordingly, the 
student and teacher should learn to utilize critical standards in the 
selection of testing instruments. 


Need for care in selecting tests 


As has been implied above, the mere fact that a test is standardized 
guarantees neither its validity for the type of use prescribed by its 
author and publisher nor its validity for the use to which a teacher 
wishes to put it. A valid test of achievement should consist of items 
that are in harmony with the accepted objectives of the subject in 
question. Yet, it has been -found that in five tests of English usage 
from one-sixth to more than one-half of the usages scored as wrong 
in the different tests are acceptable in terms of the standards accept- 
able to the National Council of Teachers of English.* 

The teacher or administrator selecting standardized tests has a 
right to expect that accurate information will be furnished him by 
the author and publisher concerning the validity, reliability, and other 
criteria of a good examination. Valuable sources of evidence about 
tests are the Mental Measurements Yearbooks, published in new 


$ Karl W. Dykema, “Оп the Validity of Standardized Tests of English Usage." 
School and Society, 50:766-68; December 9, 1939. 
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editions at frequent intervals.? These yearbooks contain carefully 
edited descriptions and critical reviews of tests by subject and test 
specialists with which to supplement information about a test fur- 
nished by the author and publisher. 


Test rating scales 


In the discussion of criteria for tests in a previous chapter, no at- 
tempt was made to evaluate in a definite manner any of the items 
that appear to affect test quality. Numerous rating devices that 
weigh the various items roughly in order of importance are available. 


Standardized Achievement Test Rating Scale 


Criteria 
ings a Reasons M Reasons 

. Validity 20 
. Reliability 10 б 

а. Adequacy то 

b. Objectivity то 
. Practicality 5 

a. Administrability то 

b. Scorability то 

с. Есопошу 5 
4. Comparability 15 
5. Utility {К 

"Totals 


Summary statement of major reasons for preference 


9 Oscar К. Buros, editor, (1) The Nineteen Thirty Eight Mental Measurements 
Yearbook. Rutgers University Press, New Brunswick, N. J., 1938; (2) The Nineteen 
Forty Mental Measurements Yearbook. Mental Measurements Yearbook, Highland 
Park, N. J., 1941; (3) The Third Mental Measurements Yearbook. Rites Uni- 


versity Press, New Brunswick, N. J 1949; and ( 
£s 4) The Fourth Mental M. S 
ments Yearbook. Gryphon Press, Highland "Park, N. J., 1953. ане 
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The assignment of point values to the different features of the tests 
is, of course, largely a subjective procedure. It is obvious that two 
different individuals using rating scales could not be expected to agree 
closely on the scores assigned to a particular test. However, in spite 
of these limitations, such rating scales are of very real value to the 
inexperienced teacher or student because of the definite way in which 
attention is called to the quality features of a test. 

The accompanying rating scale is suggested for use when two or 
more standardized achievement tests are being considered for use 
in a situation in which the purpose is well defined and the pupils to 
be tested have been decided upon in advance. If the tests are com- 
parably rated by the same person, the resulting total scores should 
lead to the selection of the test that will best serve the purpose. The 
scale is organized in terms of the criteria of a good examination out- 
lined in Chapter 4, and the weights assigned the various criteria are 
thought to represent their relative significance in total test validity. 

A supplementary check list for use before the rating scale is filled 
out appears in another acompanying illustration. Provided with 
spaces for recording significant information about a test, the check 
list should be used separately for each test under consideration. Such 
sources of information as the publisher's catalog, the test manual and 
other testing materials, and even the Mental Measurements Year- 
books and other sources of critical test reviews might well be con- 
sulted in filling out such a check list. 


Standardized Achievement Test Check List 


Title Copyright 

Author(s) Publisher 

т. Validity (Measures what it attempts to measure; Proper purpose and 
level) Y 


Analysis of 
Recommendations of 
Accomplishment of 
Rise in success by 
Social utility 
2. Reliability (Measures what it does measure; Consistency) 
Reliability coefficient(s) of ________________ type, of size: 
on cases in Grade(s) from schools in 
on cases in Grade(s) from schools in 
a. Adequacy (Wide sampling of items in outcome(s) measured) 
No. of booklet pages ; No. of items ; Testing time 
Types of outcomes measured 


states. 
states. 
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b. Objectivity (Absence of subjectivity or bias in correct answers) 
Types and numbers of items: 

Recognition: Alternate-response 

Matching. — 

Recall: Simple recall 

Miscellaneous 


; Multiple-choice 


; Completion 


3. Practicality (Practical considerations; Feasibility) 
a. Administrability (Ease of administering) 


Working time on Part(s): I TP re ; IV si 
V ; VI k 
Directions. ; Preparation for giving ; Pupil instructions ——. 


Materials needed: Booklets ; Answer sheets 
cils ; Manual of directions ; Other 
Administered by: Teacher ; Specialist 


b. Scorability (Ease of scoring) 
Scoring key ; Scoring directions 


; Special pen- 


‘ Psychologist m« -__ | 


— ; No. of separate 


scores В 
Scored by: Clerk ; Teacher ; Psychologist 25 
Machine _. 
c. Economy (Cost in money and time) 
Booklets reusable ______; Separate answer sheets — 


Cost: Booklet ; Answer sheet ; Special materials 


4. Comparability (Bases for interpreting results) 


Norms: Age ; Grade ; Percentile ; Other ў 
Raworderivedscores. —.  . ; No. of duplicate forms; 
Norms based оп —— — cases. 


5. Utility (Use to be made of results) 


Need to be served : s 

To be used in Grade(s) or with 

Planned use of results з: 

Class record ; Pupil profiles ; Diagnostic aids ____; Instruc- 
tional aids ; Other special materials 


5 ADMINISTERING TESTS 


The general procedures for administering tests suggested below 
are common to most tests now in use. They are not intended to take 
the place of the directions accompanying the various tests that may 
be used. The directions for giving and for scoring supplied in the ex- 
aminer’s manuals that accompany the better tests should be rigor- 
ously followed in order to guarantee that the tests are given under 
standard conditions. 
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Preparation for testing 


Any individual who is reasonably skillful in discipline and who will 
carefully follow the directions accompanying the tests should be able 
to administer a modern educational test. Unless the test directions 
are extremely familiar, the examiner should study the manual care- 
fully before attempting to give the test. If possible he should admin- 
ister the test to some other person in order to gain further familiarity 
with the procedure. If this is not possible, the directions should be read 
aloud several times so that they may be followed easily as the test 
is given. Familiarity with the directions is essential if the standard 
conditions for the test are to be maintained, and valid comparison of 
results with the norms thus be made possible. 

Pupils may be tested in ordinary classroom groups or in larger 
groups. If several grades are to be given the same test, time may be 
saved by moving all pupils into a larger room, care being taken that 
the seats and the desks are suitable. 

Before the test folders are given out, the desks should be cleared 
and each pupil should be provided with a sharpened pencil, or, if the 
test is to be scored by machine, with a suitable electrographic pencil. A 
number of extra pencils should be available for emergencies during the 
examination. The room should be quiet throughout the test. No ques- 
tions should be allowed during the test. A manner that is agreeable 
but that at the same time suggests authority should be cultivated. 
Pupils should be made to feel “at home" in taking the test. Pupils 
will look forward to taking tests without fear or nervousness if the 
tests are properly given and if no misconceptions about the meaning 
and use of the results are allowed to arise. 


Administration of the tests 


Throughout the examination, directions should be given in a force- 
ful manner and should be spoken slowly and with careful attention 
to emphasis. The voice should be just loud enough to carry to all parts 
of the room. The directions accompanying the tests should be followed 
verbatim. As far as possible disturbances within or without the room 
that might interfere with the administering of the tests should be 
prevented. To avoid interruptions, the teacher may prepare a card 
carrying these words: Testing Going On, Please Do Not Disturb. If 
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this card is hung on the outside of the classroom door, interruptions 
will be less frequent. 

The time limits as set in the directions for giving the tests should 
be strictly observed. Tests should be timed to the second or the results 
may not be comparable to those others get when the exact time is 
taken for the test. In timing the test a stop watch is very desirable. 
If an ordinary watch is used, one having a second hand, so that the 
minute and second hands can be synchronized, is preferable. The fol- 
lowing illustrative procedure will serve quite well if a stop watch is 
not available: 


Hr. Min. Sec. 
(a) Record time starting signal is given ...... йт r8. 20 
(5) Add to this the time required for the test Tey 00 
(c) The sum is the time to signal a stop ...... ті 33 20 


Teacher responsibility 


In the earlier stages of the development of standardized tests, it 
was believed that the most valuable results came from their use in 
а periodic survey by persons other than the classroom teacher. 
More recently it has come to be generally accepted that as many of 
the tests as possible should be given by the classroom teacher. This 
seems to be especially true in the case of tests that furnish informa- 
tion of special importance in the improvement of instruction. In addi- 
tion to allowing the classroom teacher to become acquainted with the 
technique of testing, it gives him a first-hand opportunity to observe 
the reactions of his individual pupils in the various test situations. 
On this account, it is believed that, wherever test results are to be 
used definitely as a basis for the discovery of individual pupil diffi- 
culties, tests should as far as possible be administered by the teacher 
himself. However, where the test results are used for a survey of 
achievement in the entire school or system, it is less important for 
the teacher to have an intimate contact with the testing program. As 
a matter of fact, many school administrators prefer not to have the 
teachers give the tests when they are used for such survey purposes. 


Ó SCORING TESTS 


The scoring methods and devices discussed below are those used 
more or less widely with various standardized tests. Other procedures 
which are not especially designed for specific standardized tests but 
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which are more widely used for the informal objective examination 
are discussed briefly in Chapter 7. 


Hand-scored tests 


The scoring of most modern standardized tests is made almost 
wholly objective by the use of stencil-type keys that fit the test or 
the separate answer sheets. The answer keys and directions for 
scoring each specific test should be followed rigorously. Scores should 
be obtained in exactly the manner prescribed by the test authors, in 
order that they may be compared directly with the norms that have 
been derived for the tests. It is best that all calculations be performed 
twice, and that all transcribed records be checked against the pupils’ 
test papers to make sure that no errors have been made. 

Hand-scoring keys of several types are used, among the most 
common being strip keys, cutout stencils, and transparent stencils. 
When answers are given in column form, strip keys that have the 
correct answers spaced on narrow strips of cardboard to correspond 
in spacing with the items of the test may be placed alongside a pupil's 
work for rapid scoring. When answers are scattered over a page and 
whenever the answer itself is the only point requiring the attention 
of the scorer, stencils having correct answers adjacent to apertures 
cut so that they will fall directly over the pupil's answers as the key 
is placed over the test also permit rapid scoring. Transparent stencils 
are similar to the above type, but are inconvenient because they do 
not permit the scorer to check the pupils’ answers directly on the test 
paper or answer sheet. 

The matter of responsibility for scoring hand-scored tests con- 
stantly arises as an administrative problem in the smaller schools. 
Teachers are likely to feel that the responsibility for scoring stand- 
ardized tests given for supervisory purposes should not fall to them. 
A part of this difficulty arises through a failure on the part of the 
administrators to make perfectly clear to the members of the teach- 
ing staff their responsibility toward this type of work at the beginning 
of their terms of service. Most teachers, if given a suitable amount of 
time, do not seriously object to scoring test papers for their classes, 
particularly when they come to realize that this work may reveal 
information that will be extremely significant to them in the improve- 
ment of their teaching practices. If a real interest in the outcome of 
the testing program is stimulated by the supervisory officers, there 
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Fig. 5. Cutout scoring stencil for the lowa Every-Pupil Tests 
of Basic Skills +° 


10 Iowa Every-Pupil Tests of Basic Skills, Elementary. Houghton Mifflin Co., 
Boston, 1947. 


STANDARDIZED TESTS 131 


will be little difficulty in inducing the teachers to help in interpreting 
the test papers for their classes. 


Self-scoring tests 


The Clapp-Young Self-Marking Tests ?* and Scoreze answer sheets 
for certain of the California Achievement Tests?? consist of answer 
booklets with carbon or wax transfer paper so placed that the pupil's 
answers to multiple-choice items are impressed on the back of the 
sheet on which he marks them. Each four-page booklet, kept closed 
while the pupil takes the test, is opened for scoring. Correct answers 
appear in the designated spaces on the back of the sheet for ready 
counting, while incorrect answers appear outside of the designated 
positions. The Scoreze forms also provide an original and a duplicate 
copy of the diagnostic profile as well as grade, age, and percentile 
norms for the test. The Clapp-Young folders are adapted for direct 
use with a number of standardized tests and are also available in a 
generalized form for use with informal objective examinations. 


Machine-scoring devices 


The International Test Scoring Machine '? scores pupil answer 
sheets by means of an electrical current flowing through the lead 
deposited by the pupil's electrographic pencil on the answer sheet. 
Items of the alternate-response, multiple-choice, matching, and mod- 
ified completion types сап be scored by this method.'* Scores can be 
obtained by experienced machine operators at the rate of 700 or more 
per hour. Special answer sheets are provided and directly adapted 
for use with many of the newer standardized tests, while standard 
answer-sheets in a variety of styles are available for the use of 
teachers or schools wishing to adapt their locally constructed tests 
to machine-scoring. The accompanying illustrations picture the test 
scoring machine and give examples of both types of answer sheets. 


11 Published by Houghton Mifflin Co. 

1? Published by California Test Bureau. 

13 Manufactured by International Business Machines Corporation, New York. 
14 Methods of Adapting Tests for Machine Scoring. International Business 


Machines Corporation, New York. 
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Fig. 6. International Test Scoring Machine 


Use of separate answer sheets 


Prior to the development of machine-scoring devices, separate 
answer sheets for hand-scoring were used quite widely with teacher- 
made tests. The perfection of scoring machines in recent years has 
stimulated the use of separate answer sheets with many of the modern 
standardized tests. The need for long tests, required by demands for 
improved reliability of measurement with the resulting large and ex- 
pensive test booklets, has done much to popularize the use of sepa- 
rate answer sheets for purposes of speed and economy, if for no other 
reasons. This is especially true in large school systems having access 
to mechanical scoring devices. Many different types of separate an- 
swer sheets have been developed which are adaptable to a wide variety 
of testing techniques. Such generalized or special answer sheets are 
also well adapted to hand-scoring with stencil keys of the cutout 
variety. 

An exhaustive study of the effect on test validities and reliabilities 
of the use of separate answer sheets leads to the conclusion that the 
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separate answer sheets can justifiably be иѕеа.:° Dunlap reported that 
there is no evidence to show that separate answer sheets cannot be 
used successfully with pupils in grades as low as the fourth. Some re- 
cent standardized tests provide separate answer sheets for pupils in 
the fourth and higher grades. However, for tests that are rather com- 
plex and require a complicated answer sheet, it is preferable that 
separate answer sheets should not be used much below the junior- 
high-school level. 


7 ANALYZING AND INTERPRETING RESULTS OF TESTING 


A complete discussion of the statistical techniques used in analyzing 
scores resulting from the administration of tests is given in Chapters 
12 and 13. Accordingly, this problem will not be discussed here ex- 
cept to make the pertinent remark that the modern teacher is ex- 
pected to understand and to be able to use such statistical techniques 
so that he will be able to obtain maximum values in using the results 
from tests given to his pupils. 

The results of testing are interpreted by the use of norms and also 
by the use of certain derived scores that are dependent upon norms. 
А discussion of the derivation, and to a certain extent the application, 
of norms for standardized achievement tests appears in an earlier 
section of this chapter. Chapter r3 presents a rather complete dis- 
cussion of derived scores and norms. 


Topics for Discussion 


I. What are the distinctive features of the standardized test? 

2. Show how the process of standardization involves much more than 
the mere establishment of norms for a test. 

3. Indicate why the validation of content for standardized tests is 
more difficult for some school subjects than for others. 

4. Show how discriminative power in a test item contributes to its 
validity. 

5. What reasons can you suggest for the preparation of several equiva- 
lent and interchangeable forms of a standardized test? 

6. Discuss the major types of test norms and illustrate each. 

7. What factors appear to determine the type of norms that should 
be supplied with a standardized test? 


15 Jack W. Dunlap, “Problems Arising from the Use of a Separate Answer Sheet.” 
Journal of Psychology, 10:3-48; July 1940. 
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8. What is the importance of determining the validity and reliability 
- of standardized tests in their final forms? 

9. What should be the teacher's responsibility toward the use of 
standardized tests in the classroom? 

то. Suggest a procedure by which properly designed tests may be used 
for individual pupil diagnosis. For class diagnosis. 

ir. In your opinion, what types of tests (intelligence, aptitude, general 
achievement, diagnostic or analytic, personality) should the teacher 
be encouraged to use most freely? Why? 

12. Why is it desirable to have a clear-cut problem in mind in initiating 
a testing program? 

13. What is the possible contribution of a state-wide or other type of 
cooperative testing program to the solution of local testing prob- 
lems? 

14. What reasons can you advance for the failure to develop adequate 
diagnostic and remedial materials in all subject fields? 

15. Select a school subject and show how the basic skills may be identi- 
fied (diagnosed) in a way similar to that suggested in the discussion 
in this chapter. 

16. Ina school field that you are likely to teach (your major or an im- 
portant minor), suggest a number of specific skills that enter into 
successful work and parallel this with suggestions for remedial 
treatment. 
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Constructing and Using Oral 
and Essay Tests 


THE METHODS of using the oral and essay examination and the char- 
acteristics of these types of subjective tests mentioned below are 
the basis for the discussion of this chapter: 


Extent and importance of classroom testing. 
Limitations and advantages of the oral quiz. 
Place of the oral quiz in the schools. 
Limitations of the essay examination. 
Advantages of the essay examination. 
Improving the essay examination. 


Hn BB oan 


The problems involved in the construction, selection, interpreta- 
tion, and use of standardized tests have been discussed in the previ- 
ous chapter. This chapter and the one that follows deal with the 
teacher-made or classroom test, as distinguished from the standard- 
ized test. From the point of view of many teachers, the classroom test 
constitutes the major problem of measurement. 


] CLASSROOM TESTING 


Extent of classroom testing 


Every teacher is faced with constantly recurring problems of 
measurement and evaluation in the classroom. Not all such problems 
are best solved by objective tests, for evaluation techniques of rela- 
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tively subjective types also have their place in the classroom. How- 
ever, each teacher spends days of time each year in preparing and 
scoring tests and in analyzing and interpreting the results. It has 
been estimated that teachers average more than a week of school 
time annually in work with tests. 


Need for improvement in classroom testing 


The study of standardized tests thus far in this volume must make 
it apparent that by their very structure and use standardized tests 
do not meet all classroom needs for evaluation and measurement. 
In the first place, such tests do not equally well serve in all schools 
because of differences in emphasis and points of view resulting from 
the varying characteristics and educational needs of different com- 
munities. Also, classroom testing is sometimes important in an area 
So narrow or so specialized that no available standardized test fills 
the need. Again; teachers sometimes feel that standardized tests 
overstress factual knowledges and neglect what they believe to be 
important—the ability to organize and apply facts. For these rea- 
sons, written examinations prepared by local teachers, or at least 
within the local school system, will undoubtedly always be needed 
to meet the demands for complete and valid measurement of edu- 
cational achievement. 

Even a superficial observation of typical examination procedures 
of teachers makes it apparent, however, that the basic aims of exam- 
inations are not achieved in many instances, for the reason that the 
tests constructed and used by teachers fail to accomplish what is 
expected of them. It is indeed unfortunate that teachers, sometimes 
not realizing that their tests fail to accomplish the desired purposes, 
unduly penalize pupils for lack of success on the tests. It is neces- 
sary, therefore, that the weaknesses of classroom tests be recognized 
and that the proper steps be taken to bring teacher-made or class- 
room tests to as high a level of efficiency as possible. 


2 ORAL EXAMINATIONS 


Important as the oral quiz may be for instructional purposes, little 
need here be said concerning its use in the classroom for measure- 
ment purposes. When used as a teaching device in the Socratic man- 
ner, as a method of leading pupils by astute questioning to the 


140 THE SECONDARY SCHOOL 


attainment of new understandings, oral questioning has teaching 
but not measuring significance. As a fact-finding technique in the 
interview, and in questioning the individual pupil on specific aspects 
of his work for obtaining diagnostic leads, oral questioning has 
evaluative possibilities. These, however, are not situations of the 
type in which oral examinations have most typically been used in 
the attempt to measure pupil achievement. As was pointed out earlier 
in this volume, Horace Mann sounded the death knell for such a use 
of the group oral examination more than a century ago.’ Experi- 
mental evidence from many studies has indicated that the oral 
examination of individuals is seriously lacking in reliability and 
validity.? 


Limitations of the oral examination 


To summarize Horace Mann's statements or implications, the 
oral examination: (т) is not equally fair and just to all pupils, (2) 
does not test extensively or efficiently, (3) permits interference and 
favoritism, intentional or otherwise, by the teacher, (4) is unjusti- 
fiably time-consuming, (3) leaves no permanent objective record of 
pupil performance, and (6) does not permit an evaluation of the 
difficulty of questions. While these indictments by Horace Mann 
accomplished very little in the sense of effecting any immediate wide- 
spread changes in examination practices, the weaknesses of the oral 
examination for measurement purposes have probably not since 
been stated more effectively. 


Advantages of the oral examination 


The oral examination or quiz does have some uses, however, in 
the evaluation and measurement of pupil performance, even though 
its values admittedly are not great when it is used in the classroom 
situation. Certain types of performances, such as oral language, 
pronunciation in the foreign languages, and group performances in 
debates and glee club competitions, can be evaluated only in terms 
of the oral production. However, the evaluation of such perform- 


1 Otis W. Caldwell and Stuart A. Courtis, Then and Now in Education, 1845- 
1923. World Book Co., Yonkers, N. Y., 1923. p. 3 

#1. N. Thut and J. Raymond Gerberich, Foundations of Method for Secondary 
Schools. McGraw-Hill Book Co., Inc., New York, 1949. p. 163. 
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ances is quite different from attempts to measure pupil achievement 
by judging the quality of oral responses of a factual nature. Oral 
questioning can be used with an individual pupil in probing his 
reasons for havirg responded as he did to certain questions on writ- 
ten examinations or on certain mathematical or scientific problems, 
in an attempt to determine the causes of error. In this sense it is a 
diagnostic testing tool. Oral questioning can be used in determining 
how well an individual pupil has integrated his knowledge, can apply 
it to various situations, and sees its implications. Oral examinations 
may be used with individual pupils satisfactorily if proper advance 
preparations are made, if consistent procedures are followed in the 
question session, and if scoring and rating methods are systemati- 
cally applied. However, this use of the oral examination is very 
time-consuming and highly subjective—qualities that make it im- 
practicable for use with each pupil in a class for purposes of pupil 
comparisons. 

In considering the above legitimate uses of oral questioning, it 
should be clearly noted that the conditions under which this method 
is properly used and the purposes it is appropriately expected to 
serve are very different from those operating when it is used with 
a group of pupils in the classroom to determine educational achieve- 
ment. In general, the oral examination has relatively little utility in 
the classroom for measuring achievement, especially as a basis for 
determining pupil marks in a course. 


Q ESSAY EXAMINATIONS 


The traditional or essay examination continues to occupy an 
important place among the testing techniques used by the classroom 
teacher, although during the past few decades it has lost the domi- 
nant position it occupied at the turn of the century. Skepticism 
concerning the traditional examination arose more than a decade 
before 1900.* Edgeworth published in England during 1890 what 
was perhaps the first critical study of the essay test.* It remained, 
however, for Starch and Elliott to bring the issue sharply to the 


3 T, L. Kandel, Examinations and Their Substitutes in the United States. Carnegie 
Foundation for the Advancement of Teaching, Bulletin No. 28. The Foundation, 


New York, 1936. p. 27-35. 
4F, V. Edgeworth, “The Element of Chance in Competitive Examinations." 


Journal of the Royal Scciety, 53:460-75, 644ff.; 1890. 
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front in America in 1912 by a report of marks assigned to an Eng- 
lish examination paper by various teachers? and to follow it 
shortly by similar reports on two other subject fields. Although it 
is probable that educators for various reasons somewhat misinter- 
preted the findings of these and many subsequent studies of the 
traditional examination, the fact remains that the studies very 
effectively called attention to a major weakness of this testing 
technique. 


Limitations of the essay examination 


Two major limitations and several related minor limitations char- 
acterize the essay examination. The two major limitations of the 
essay examination, (т) limited sampling, and (2) subjectivity of 
Scoring, are discussed in some detail in the following paragraphs, 
and the minor limitations are discussed briefly. 

Limited sampling. The first major limitation of the essay exami- 
nation is its limited sampling of the content of the course. A test 
that consists of five or ten questions cannot hope to sample widely 
over any sizable area of content or activities, but can measure only 
a few of the important areas in which pupil abilities should be tested. 

Figure 8 shows in graphic form an hypothetical testing situation 
that points out sharply the undesirable results of limited sampling. 
Each one of pupils A, B, C, D, and E knows exactly half of the 
material over which the test is to be given. However, the particular 
facts mastered by each pupil are not the same throughout. For 
example, Pupil A, who was perhaps regular in his attendance dur- 
ing the first half of the course, has a mastery of the earlier units 
of the course. This is indicated by the shaded portion of the column. 
The second pupil, through irregular attendance, spasmodic prepa- 
rations, or other unknown causes, mastered a few of the facts, missed 
another section, and then perhaps learned a few more. Pupil C was 
just as irregular in his attendance, but for some reason learned 
exactly those items missed by Pupil B. Pupils D and E show other 
variations of the situation. It might be carried on almost indefinitely, 
but these five cases are adequate to illustrate the entire range of 
variation due to sampling. 

Now if a typical essay examination consisting of four ques- 


5 Daniel Starch and Edward-C. Elliott, “Reliability of Grading High School Work 
in English." School Review, 20:442-57; September 1912. 
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tions from the various areas of the course is given, it will be noted 
from this diagram that distinctly different types of responses are 
secured from these pupils. Pupil A, knowing the facts in the first part 
of the work, responds to the first two questions and makes a score 
of so per cent. The second pupil, B, by sheer chance or unfortunate 
guidance in the selection of the facts he learned, misses each of the 
four questions, and receives a zero score. Pupil C, through good for- 
tune (or judgment), happens to have mastered the items in the exact 
areas sampled by the test, and thereby makes a perfect score on 
the test. Pupils D and E, illustrating other variations due to chance, 
score 75 per cent and 25 per cent on the examination. Thus there is 
a variation of from o to roo per cent on the examination taken by 
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Fig. 8. Effect of limited sampling on test scores 


five pupils each of whom actually has a mastery of exactly 5o per 
cent of the facts. This type of error in measurement of achievement, 
which unfortunately is not uncommon, is due to the factor of inade- 
quate sampling. The effect of increasing the sampling from four 
items to many items is demonstrated by a further development and 
discussion of this diagram on page 164 of Chapter 7. 

Subjectivity of scoring. ^ second outstanding characteristic of 
the essay examination is subjectivity of scoring. Starch and Elliott, 
who had 142 teachers score identical copies of an English examina- 
tion paper, found that the scores based on 1oo per cent for perfection 
ranged from a low of so to a high of 98.° In another study, they 


$ Ibid. 
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found that 115 teachers rated a geometry paper from a low of 28 
to a high of 92.7 Ruch had 91 teachers of geography score the essay 
examination papers judged to be the best, average, and poorest 
papers from a class on the basis of 20 for an entirely satisfactory 


Fig. 9. Marks assigned to an English examination by 142 teachers ^ 


answer and o for an answer practically without discernible merit. 
The range of scores on the best paper was from 3 to 20, on the 
poorest paper from o to 2, and on the average paper from 2 to 20, 
with average scores being 16.1, 0.1, and то.) respectively for the 
best, poorest, and average papers.? 

Eells *° had 61 teachers score an examination consisting of four 
essay questions in geography and history, and eleven weeks later 
had them score the same answers again. Reliability coefficients, ob- 
tained by correlating the first and second scores assigned by the 
same teachers, ranged from 0.25 to 0.51 for the four essay questions. 
This and other evidence showing wide differences in the two sets of 
scores assigned by the same persons, led him to conclude that the 
same individuals vary from time to time in their judgments about 
as widely as different individuals vary. 

Stalnaker, on the basis of an extensive experiment in the evalua- 
tion of English papers, concluded that “the typical essay test as 
typically handled ...is not reliably graded and, therefore, cannot 
stand alone as a good measuring instrument.” 11 From these and 


* Daniel Starch and Edward C. Elliott, "Reliability of Grading High School Work 
in Mathematics," School Review, 21:254-59; April 1913. 

* Starch and Elliott, “Reliability of Grading High School Work in English." 

96. M. Ruch, The Objective or New-Type Examination. Scott, Foresman and 
Co., Chicago, 1929. p. 78-81. 

10 Walter C. Eells, “Reliability of Repeated Grading of Essay Type Questions." 
Journal of Educational Psychology, 21:48-52; January 1930. 

11 John M. Stalnaker, “The Essay Type of Examination." Educational Measure- 
ment. American Council on Education, Washington, D. C., 1951. p. 499. 
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other investigations it becomes apparent that the scoring of the 
essay examination is a highly subjective process and that the result- 
ing scores are correspondingly inaccurate. 

The effect of a lack of objectivity in the unit of measurement may 
be demonstrated to anyone who will try to measure the length of a 
table top by using a rubber band as the measuring instrument. The 
length of the table in rubber-band units depends on how much ten- 
sion is placed on the rubber band. Obviously no accurate measure- 
ment can result. 

The subjectivity of scoring shown by practically all studies of 
the essay examination is more the result of varying standards of 
expectancy among the teachers concerned than of any other cause. 
Such standards of expectancy vary from day to day, teacher to 
teacher, grade to grade, and school to school. Unfortunately, from 
the point of view of improving the accuracy of scoring the essay 
test, this limitation appears to be largely innate in the type of ex- 
amination itself. The establishment of uniform standards of achieve- 
ment in the teacher is probably a human impossibility. The remedy 
lies not in the attempt to produce it but in giving the teacher a 
tangible unit of measurement. 


TABLE 7. Shifting standards of expectancy 


Quality of 
Products. 


Shifting standards of expectancy may be illustrated by the data 
of Table 7. Here are shown the shifts in standards that enter into 
teachers! estimates of school products. If it is assumed that a given 
school product, such as a handwriting or drawing specimen, has a 
rating-scale value of 5o, it appears from the table that the specimen 
might receive a superior mark at the fourth-grade level. It would 
represent distinctly superior work for that grade. To the eighth- 
grade teacher, however, the specimen would appear to be very in- 
ferior as an eighth-grade product and a very poor mark might be 


assigned to it. 
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Minor limitations. Another factor which affects the teacher in his 
marking of examination papers, but which did not enter into the 
studies reported above, is that he typically knows the pupils whose 
papers he is marking and also ordinarily knows whose paper he is 
scoring at a given time. He is certain to be influenced by that knowl- 
edge. He is probably prone to give pupils who have previously done 
good work or at least made favorable impressions on him the ad- 
vantage of the “halo effect." This term describes the tendency to 
give high marks to such pupils in some instances where they are 
not deserved, by explaining to himself, perhaps unconsciously, that 
he knows they know the correct answers even though their responses 
to the questions are not highly satisfactory to him. Similarly, the 
pupils he has catalogued as of low ability are sometimes penalized 
by his tendency to consider their.good answers merely “shots in the 
dark" or as implying ideas that the pupils did not actually under- 
stand. 

Still another type of factor affecting the objectivity of scoring of 
an essay test is found in the influence upon the reader of hand- 
writing and general neatness of the paper; spelling, punctuation, 
and grammar ; organization of the paper; and even its length. It is 
certainly true that a neatly typed paper of mediocre content re- 
ceives the benefit of the doubt from most readers. It is also true that 
such characteristics as good handwriting, English usage, and or- 
ganization predispose the reader toward high marks, and it is self- 
evident that the slow writer is penalized if a premium is attached 
to length of responses apart from their quality in examinations 
where the time is rigidly restricted. Some teachers penalize a pupil 
for deficiencies of these types, but other teachers do not. Many 
teachers are also unconsciously affected by the quality of a paper 
read just previously to the one being marked. Moreover, the same 
teacher may penalize a pupil for such deficiencies one day and not 
do so on another occasion, depending on his mood at the moment, 
and may penalize some pupils and not others. 

Other influences which in considerable degree enter into the 
marking of tests are evidences of pupil effort, improvement, attitude 
toward the teacher and the course, conformance, and a multitude of 
other indications of what the teacher might consider desirable be- 
havior on the part of the pupil. Some teachers believe in assigning, 
relatively higher marks to pupils who try but do poorly than to 
pupils who appear not to try but do well. Others assign good marks 
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to pupils who conform to sometimes inconsequential and irrelevant 
demands and penalize pupils who do not conform. 

The two types. of scoring errors accounting for the subjectivity of 
the essay test are known as constant errors and variable errors. Con- 
stant errors are those that result from a tendency to mark high or to 
mark low, i.e., to be an “easy” marker or a “hard” marker. Variable 
errors result from the tendency of all persons to vary in their judg- 
ments from time to time, according to their states of mind, the states 
of their digestions, and many other factors. 

Pupils who do not know the answers to essay questions are prone 
to respond in terms calculated to cover up their lack of information 
if not actually to mislead the teacher. Such responses, which tend 
to vary in plausibility directly in relation to the intelligence of the 
bluffer, may take the form of discussion concerning content closely 
related to that covered by the question, of very incomplete answers 
which by repetitious statements and copious illustration may give 
a sense of completeness, and various other devices. Whether bluff- 
ing is or is not desirable is not the issue. Certainly bluffing is re- 
sorted to in great or small degree by all persons on some, if not 
many, occasions. To the extent, however, that bluffing is actually 
successful on essay tests, the examination results are less accurate 
measures of pupil achievement. 

Stalnaker, after presenting evidence that the cost of securing an 
accurate reading (evaluation) of essay questions is practically pro- 
hibitive, summarized his position on the problems of the essay ex- 
amination as follows: “The accurate evaluation of a well-developed 
essay question is a long and difficult job and one which, properly 
done, requires intelligence, diligence, and consistency. The expense 
in time and money can be justified only to the extent that essay 
items are developed to measure reliably important objectives which 
cannot otherwise be measured.” 1? 


Advantages of the essay examination 


Only the major and rather comreonly accepted advantages of the 
traditional or essay examination will be discussed here. It should 
be remembered here particularly that it is the total effect of the 
examination which is important rather than the specific aspects 


1? [bid. p. 502. 
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considered singly. An advantage, then, may not be an advantage 
when there is balanced against it one or more dependent disad- 
vantages. 

Ease of construction and administration. Essay tests are com- 
monly considered easy for teachers to prepare and to administer. 
Pupils feel that they know the nature of essay test questions and 
the traditional methods of answering them. Teachers typically give 
a minimum of time to the preparation of essay questions. Sometimes 
the questions are not even formulated until immediately before the 
examination is to be given. Some teachers even prepare the last part 
of the test while the pupils are writing on the first questions. Little 
or no time is taken for telling pupils how to take the essay examina- 
tion, However, essay tests prepared and administered with a mini- 
mum of effort are likely to have such resulting disadvantages that 
the saving of time and labor may well be at the expense of testing 
efficiency. 

Adaptability to school subjects. It is possible to use the essay 
examination for practically all subjects of the school curriculum, 
for the question and answer method is widely adaptable. Some types 
of outcomes, such as arithmetic skills, handwriting skills, reading 
ability, and others, cannot be tested directly by this device, but the 
factual backgrounds for them írequently can be so tested. As a 
matter of fact, the essay test procedure is often used in scoring 
arithmetic examples by the use of arbitrary decisions in scoring for 
correctness of the result or correctness of the method, in giving 
partial credit for answers not entirely correct, and in various other 
ways. 

Measurement of higher mental abilities. Advocates of the older 
type of examination insist that the discussion-type questions have 
values not possessed by the informal objective test in that they call 
for comparison, for interpretation of facts, for criticism, for defense 
of opinion, and for other types of higher mental activity. Essay ques- 
tions allow for some range of choice, which makes possible the meet- 
ing of differences in courses and readings pursued. The purpose of 
the written test is primarily to ascertain whether the student has 
accurate knowledge and a considerable amount of understanding 
about a wide variety of matters in terms of their interacting rela- 
tionships but not basically to determine whether he knows certain 
highly detailed facts and whether he has met routine course re- 
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quirements. What is sought іѕ а measure of accurate knowledge of 
fact, understanding of complex ideas, and ability to interpret and 
to criticize and decide. In short, the questions are devised to test 
the pupil’s ability to make use of knowledge. This is particularly 
true for advanced students, for whom the testing of such types of 
higher abilities is more important than the testing of the broad 
factual knowledges that almost certainly have been acquired to a 
high degree. 


Advantages claimed for the essay test 


Various advantages have been claimed for the traditional ex- 
amination. Some of these advantages appear to depend on evidence 
that is not too conclusive. In many cases the decision depends as 
much on the philosophy of the individual teacher as on definite 
research findings, so that possible advantages cannot be claimed 
with certainty. 

Freedom of response. The freedom of response that the essay test 
question allows is considered by some students of examination meth- 
ods as one of its fundamental characteristics. By the nature of the 
question the student is required to survey his own background of 
related information and to select the related facts and organize 
them for expression in his own words. It is important, however, that 
the freedom of selection, organization, and expression be suited 
to the measurable outcomes of the course. 

Training in the use of written English. It has been contended by 
various persons in the past that training in the effective use of 
English is a logical function of the examination and that the essay 
test furnishes such training. However, neither contention is de- 
fensible. Courses in English provide training in the use of English, 
as do, indirectly and as by-products, many other types of school 
experiences. The examination, which has definite uses and purposes 
in the measurement area to occupy its attention, should not be 
expected to furnish training in the use of English, although, of 
course, written language is required in the essay test. Furthermore, 
the conditions under which the essay examination is typically given 
—pupils writing at high speed and without time to organize their 
thoughts carefully—are not conducive to the best use of English. 
Certainly examinations in such courses as language, composition, 
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spelling, and perhaps reading and literature might be devised to 
, furnish training in the written use of English, but there seems to be 
no justification for shortening the time given to the measurement 
of direct course outcomes in such subjects as the sciences, social 
studies, and arithmetic in order to furnish the pupils this type of 
'training. 

Motivation of desirable methods of reviewing. Many teachers feel 
that some student groups prepare for essay tests more often by 
reviewing broadly the important aspects of course content but 
that they more frequently review for the objective test by memori- 
zation of facts or exact wordings of the textbook. No one would 
deny the general desirability of the first rather than the second 
type of review. However, such opinions are usually based on ob- 
servations of how a few groups of pupils say they prepare for 
examinations. Probably the type of examination is less important 
to the pupil in determining how he should review than the nature 
of the test. An essay examination may or may not stress detailed 
facts. An objective test may or may not stress detailed facts. Teach- 
ers differ markedly in the emphases they assign to factual learning 
and to applications of facts in the tests they give. 


Conclusions concerning the essay examination 


For many years the essay-type test has been subjected to intense 
criticism. In spite of these attacks, however, it is still in use in 
numerous classrooms and doubtless performs. a worthwhile function 
there. While it is true that when the essay test is subjected to a 
critical appraisal under research conditions many of the claims that 
have been advanced for it do not stand up any too well, it is also 
true that it performs certain functions in the classroom and for the 
pupils that certain of the other more objective forms of tests fail 
to accomplish. Without doubt the essay-type test is firmly fixed in 
educational practice. It is a type of examination with which all 
teachers are familiar, and with all of its faults it undoubtedly pos- 
sesses sufficient merit to warrant considerable attention to its im- 
provement. : 

It is now recognized that only a portion of the variability of 
marks assigned to an examination by different teachers, as in the 
Starch and Elliott and other studies, can be attributed to the un- 
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reliability of the essay examination itself. A comparable share of 
the variability can be charged to the lack of uniformity in the 
scoring procedures followed by the teachers. Whereas the different 
teachers used in such studies had very different educational aims 
and standards of excellence, the teacher who scores an entire set of 
papers attempts to apply the same set of standards to all papers, 
and has the benefit of experience with previous papers as a basis for 
doing so. Furthermore, the teachers in those studies used no scoring 
rules save those which they developed individually, but the teacher 
who scores a set of papers usually applies more or less tangible 
and consistent scoring procedures. 

A final summation of the limitations and advantages of the essay 
examination cannot be conclusive. Certainly the limitations of the 
test as it has been, and perhaps even today is, most widely used 
greatly outweigh its advantages. However, it may be that when 
the essay test is used with optimum efficiency and for carefully 
defined purposes many of its advantages will be realized. 


4 IMPROVING ESSAY EXAMINATIONS 


Many suggestions for improving the essay examination have 
been made by students of this type of test. Most of the suggestions 
have to do with: (1) the selection of test content and the framing 
of questions, and (2) the scoring of test results. The discussion 
below presents a few of the approaches to the improvement of the 
essay test by these two methods, but does not attempt to consider 
how the test may be improved in the specific subjects of the school 
curriculum. 


Types of essay questions 


Monroe and Carter classified essay-type questions with respect 
to the mental activity each type is designed to elicit in the pupil, 
and presented both descriptive statements concerning, and examples 
of, the twenty varieties they distinguished.** The descriptive state- 


13 Walter S. Monroe and R. E. Carter, The Use of Different Types of Thought 
Questions in Secondary Schools and Their Relative Difficulty for Students. Bureau 
of Educational Research Bulletin, No. 14. University of Illinois, Urbana, 1923. 
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ments and illustrative questions below are from Odell’s adaptation 
and supplementation ?* of the questions from Monroe and Carter’s 
list. 


1. Selective recall—basis given. (Name the presidents of the United 
States who had been in military life before they were elected.) 

2. Evaluative recall—basis given. (Name the three statesmen who 
have had the greatest influence on economic legislation in the 
United States.) 

3. Comparison of two things—on a single designated basis. (Compare 
Eliot and Thackeray as to ability in character delineation.) 

4. Comparison of two things—in general. (Contrast the life of Silas 
Marner in Raveloe with his life in Lantern Yard.) 

5. Decision—for or against. (In which in your opinion can you do 
better, oral or written examinations? Why?) 

6. Causes or effects. (Why has the Senate become a much more power- 
ful body than the House of Representatives?) 

7. Explanation of the use or exact meaning of some phrase or statement 
in a passage. (Explain the meaning of the expression “Sinais climb" 
in the line: “We Sinais climb and know it not.") 

8. Summary of some unit of the text or of some article read. (Sum- 
marize in about one hundred words the advantages of the hot-air 
furnace.) 

9. Analysis. (Mention several qualities of leadership.) 

то. Statement of relationships. (Tell the relation of exercise to good 
health.) 

1r. Illustrations or examples (your own) of principles in Science, con- 
struction in language, etc. (Give an original sentence in Latin 
illustrating the use of the infinitive in indirect discourse.) 

12. Classification—usually the converse-of No. 11. (To what group of 
plants do the mosses and liverworts belong?) 

13. Application of rules or principles in new situations. (In what coun- 
tries other than Brazil would you expect to find rubber planta- 
tions?) 

14. Discussion. (Discuss the Monroe Doctrine.) 

15. Statement of aim—author’s purpose in his selection or organization 
of material. (What was the purpose of the author in having Athel- 
stane return to life after he was apparently dead?) 

16. Criticism—as to the adequacy, correctness, or relevancy of a printed 
Statement, or a classmate's answer to a question on the lesson. 


14C, W. Odell, Traditional Examinations and New-Type Tests. Century Co., 
New York, 1928. p. 207-10. 
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(Criticize *Macbeth was wholly indifferent to the superstitions of 
his time.") 

17. Outline. (Outline in not more than one page the chief events of 
the French and Indian Wars.) 

18. Reorganization of facts. (Select the incidents which characterize 
Portia in The Merchant of Venice.) 

19. Formulation of new questions—problems and questions raised. (If 
you were asked to state how much you could trust the viewpoint 
of a particular historian about whom you know little or nothing, 
what questions would you want to have answered concerning him?) 

20. New methods of procedure. (How might the plot of Julius Caesar 
be changed to make it a comedy rather than a tragedy?) 


Questions of the essay type are commonly classified into three 
types: (1) simple-recall, (2) short-answer, and (3) discussion. The 
simple-recall questions, demanding a short response that can be 
accurately scored, require a name, a number, a date, a place, 
or an event in answer to who, how many, when, where, and what 
questions. The short-answer questions, demanding statement, phrase, 
or sentence responses that can be rated quite objectively, require 
answers to such key words as define, identify, list, find, апа state. 
The discussion questions, requiring responses of such complexity 
that objectivity of scoring is difficult, request answers to such words 
as discuss, explain, describe, compare, and outline. As most teachers 
are well aware, some essay questions are sufficiently definite that 
responses can be evaluated objectively, but others are so general 
that responses can be rated with reasonable accuracy only by the 
use of definite scoring rules or some similar method. 


Increasing the objectivity of scoring the essay test 


Approximately forty years ago, Kelly conducted an investigation 
into the causes of variation in teachers’ marks on examination 
papers.5 He found that the use of a rather definite set of rules 
resulted in greatly reduced variations in scores when the papers 
were rescored. More recently, Stalnaker obtained reliability coeffi- 
cients ranging from .84 to .99 for the scores assigned to essay exami- 


15 Fred J. Kelly, Teachers’ Marks. Contributions to Education, No. 66. Teachers 
College, Columbia University, New York, 1914. 
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nations in a variety of high-school subjects by experienced teachers 
when scoring rules were used.'^ These reliability coefficients show 
a highly satisfactory degree of scoring accuracy, especially when 
it is considered that only the lowest coefficient was under .9o. Other 
studies of the results obtained when the essay test was scored under 
closely controlled conditions substantiate the conclusion that the 
traditional examination can be scored reliably if proper precautions 
are taken. 

Sims proposed a rating method of scoring essay examinations.!* 
He suggested that the readers work out for themselves acceptable 
answers to the questions and then use the following procedures: 


4. Quickly read through the papers and on the basis of your opinion of 
their worth sort them into five groups as follows: (a) very superior 
papers, (5) superior papers, (c) average papers, (d) inferior papers, 
(e) very inferior papers. (Remember that in a normal group you 
would expect to have approximately ro per cent of very superior 
and ro per cent of very inferior papers, 20 per cent of superior and 
20 per cent of inferior papers, and до per cent of average papers. 
Do not, however, try to conform rigidly to this rule. Your group 
may not be a normal one.) 

b. Re-read the papers in each group and shift any that you feel have 
been misplaced. 

c. Make no attempt to give numerical grades or to evaluate each ques- 
tion. Place each paper on the basis of your general impression of 
the total. 

d. Assign letter grades to each group; beginning with A for the very 
superior group, B for the superior group, etc. 


Stalnaker reported evidence that the use of optional questions ap- 
pears to reduce the reliability of marking the essay examination and 
recommended that optional questions be avoided.!* 

Wrightstone recommended that essay tests be designed to measure 
only one objective of instruction at a time, such as interpretation of 
facts, that all scorers agree on a definition of the objectives and on 


16 John M. Stalnaker, *Essay Examinations Reliably Read." School and Society, 
46:671-72; November 20, 1937. 

17 Verner M. Sims, “The Objectivity, Reliability, and Validity of an Essay Exam- 
ination Graded by Rating.” Journal of Educational Research, 24:216-23; October 
1931. 

18 John M. Stalnaker, “The Essay Type of Examination." Educational Measure- 
ment. American Council on Education, Washington, D. C., 1951. p. 506. 
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certain standards of values, that an ideal answer be formulated and 
each part assigned a certain number of points, and that an eleven- 
point scale from o to то be used for each test unit.'? 


The following suggestions, by largely eliminating the personal 


judgment or bias of the scorer, have been found valuable for use in 
scoring essay-type responses: 


I. 


Examinations should be scored by the one who makes out the ques- 
tions. He should know exactly what responses are desired, and 
should write out his answers to the questions in advance. 

Each pupil taking the test should write his name on the back of the 
test paper and the scorer should disregard the name until the test 
is scored. This eliminates the subjective factor of being influenced 
or biased in judgment because of former contacts with the pupil, 
insofar as the teacher does not become aware of the writer's identity 
through his handwriting or his manner of expression. 

The scorer should not mark off for misspelled words or poor sen- 
tence structure, paragraphing, or handwriting. Similarly, he should 
not increase the score for excellence in these things. However, such 
factors may be indicated or checked on the examination. The reason 
for this lies in the fact that the function of the examination is to 
measure the pupil’s abilities in a course and not his ability to write 
or to spell. If it is desirable to test his ability to write, spell, or use 
correct written English, suitable tests can be obtained for these pur- 
poses. 

Each separate item should be scored in all of the papers consecu- 
tively. This is preferable to the correction of each entire test as a 
unit, for it permits the scorer to concentrate on the answer to a 
single test question and to judge better the merits of the several 
pupil responses to the same question. 

Each question should be rated on a scale of ten, twenty, or a given 
number of scoring points. The total score should’ be obtained for 
each pupil by adding the scores on the different questions only 
after all of the scoring has been done. 


Whatever rules are followed, they will necessarily be arbitrary 


and not always wholly defensible. The significant point in the use 
of rules is that they provide for reasonable uniformity in handling 
the papers of all the pupils and also furnish a guide for the control 
of irrelevant factors that may affect the objectivity of the scoring. 


19 J, Wayne Wriglitstone, “Are Essay Examinations Obsolete?" Social Education, 


1:410-15; September 1037. 
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Suggestions for improving the essay examination 


Four conditions appear to be necessary in bringing about improve- 
ment in the teacher-made examination of the essay type. These 
conditions are: 


(т) [The exact purpose of the examination must be understood by 
"both the teacher and the pupil. The emphasis of the essay ex- 
amination should be definitely on thought, reasoning, and other 
types of mental activity as applied to the materials of the course. 
The main concern is with topics that involve interest-centers or 
relationships and problematical issues. Answers to questions in- 
volving judgments, synthesis, and generalizations are admittedly 
difficult to evaluate, but they reveal aspects of pupil mastery 
and mind quality probably not obtained from other types of 
responses. 


(2) The content of the examination should be governed by its pur- 
" pose. In general, a test should parallel the objectives and pupil 
outcomes of the course. This means that there should be a proper 
balance of test content not only with respect to the subject 
matter but also with respect to the types of abilities to use and 
apply informations that are desired pupil outcomes. 


(3) The preparation and selection of suitable essay-type questions 

У should consume at least as much time as is required to score the 

answers. If this is done, the value and the accuracy of the scores 
obtained are almost certain to be increased. 


(4) Definite rules should be formulated that will as far as possible 

. [control irrelevant factors in scoring the papers. The careful use 
of scoring rules will bring about a definite decrease in the in- 
accuracy of the pupil scores. 


The accompanying tentative score card for rating essay-type 
questions is suggested as a possible means of improving this type of 
teacher-made examination by calling attention to the desirable quali- 
ties in test questions. Unless a question rates *Yes" on at least seven 
of the ten items, it is certainly of doubtful value and should probably 


be rewritten and given a new emphasis or be completely eliminated 
from the examination. 
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Tentative Score Card for Rating Essay-Type Examination Questions 


Slightly 


. Is the question concerned with im- 
portant phases of the subject? 


. И the question emphasizes minor de- 
tails, are they useful in linking up other 
facts, ideas, theories, involved in the 
subject ? 


. Does the question give emphasis to 
evaluation and to relational thinking? 


. Is the question apparently of a suitable 
degree of difficulty in relation to the 
other questions in the test? 


. Is the question stated in such a way as 
to stimulate thought, to challenge the 
interest of the pupils? 


. Does the question motivate the pupil 
to integrate his ideas around certain 
interest-centers? 


. Is the question stated in such form as 
to cause the pupil to sample widely 
into his background of fact? 


. Does the question call for any orig- 
inality of thought organization and 
expression ? 

. Does the question call for the pupil to 
integrate facts gained from different 
sources? 


. Is the question limited sufficiently that 
the pupil has some chance of writing 
what he really knows about it in a 
reasonable time? 


Topics for Discussion 


. Indicate why there is need for improvement in classroom testing. 

2. What are some of the major weaknesses of the oral examination for 
testing purposes? 

3. What uses should the oral quiz be expected to serve in the school? 

4. Discuss fully the manner in which limited sampling reduces the 
reliability of the essay examination. 

5. List and discuss several factors that contribute to subjectivity of 
scoring the typical essay examination. 

6. Comment upon some of the minor weaknesses of the essay test. 

7. List and evaluate the advantages that have been attributed to the 


traditional examination. 
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8. What are your conclusions concerning the proper place of the essay 
test in classroom measurement? 

9. Identify some of the types of essay questions and indicate key words 
by which they are introduced. 

ro. Suggest at least five specific devices or procedures for increasing 
the objectivity of scoring essay-type tests. 

11. Outline testing procedures by which the essay-type test may be 
made more effective as a classroom testing technique. 
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Constructing and Using 


Informal Objective Tests 


Tuis CHAPTER deals with the following points concerning the con- 
struction and classroom use of informal objective tests: 


A. 


Similarities of the informal objective examination and the 
standardized test. 

Major advantages and limitations of the teacher-made ob- 
jective examination. 

"Types of instructional outcomes. 


р. Selecting the content and preparing an informal objective 


test. ls 

Administering and scoring the informal objective test. 
Uses and limitations of basic objective item forms. 
Illustrations of objective-test item types. 

General suggestions for constructing objective test items. 
Suggestions for constructing basic types of recall and recog- 
nition test items. 


Developments contributing to the improvement of measurements 
in education have largely followed two main lines: (1) the construc- 
tion and improvement of standardized tests, and (2) the improve- 
ment of teacher-made tests. It is with the second of these that this 
chapter is concerned. In many respects these two types of measure- 
ment are not fundamentally different. Both utilize samplings of 
material to stimulate pupil reactions. In both the performance is 
expressed in terms of a score. Both make use of exercises that are 
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characterized by being objective. Yet, in spite of these similarities, 
the two types of tests do not seriously overlap in function. 


] CHARACTERISTICS OF CLASSROOM TESTING 


Importance of classroom testing 


Even though standardized instruments for measuring achievement 
of school children have come into wide use, the examination con- 
structed by the teacher still remains the most frequently used means 
of measuring the achievement of pupils. Although properly con- 
structed standardized educational tests may be superior in certain 
respects to teacher-made examinations, they should never displace 
the teacher-made test as a means of measuring the results of teaching 
as indicated by pupil attainment. The teacher frequently has need 
for a measuring instrument adapted to a particular course of study 
or to the instructional emphasis that has been given to the subject 
in the teaching of a particular class. The informal objective examina- 
tion constructed by the teacher to fit the instruction the class has 
been receiving is the obvious answer. 


Standardized vs. non-standardized objective tests 


Standardized educational tests are structurally not fundamentally 
different from informal objective examinations in their basic elements. 
In fact, standardized educational tests are essentially little more than 
improved and refined objective examinations. 

Tn contrast with their similarities from a structural point of view, 
the functions of the standardized test and the informal objective 
examination over the same material are quite distinct. The standard- 
ized test, because it is intended for use in many different school sys- 
tems and in connection with many different types of courses of study, 
must be general as to content. The maker of a standardized test 
cannot be sure that its content will actually parallel the instructional 
emphasis given the subject in the course offered by any individual 
teacher. Accordingly, the standard test is useful mainly for general 
comparisons of school with school, class with class, or city with city. 
It is not designed for use in evaluating the accomplishment of pupils 
in a class under a particular instructor with a specialized instructional 
emphasis. By the same reasoning, the standardized test should prob- 
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ably not be used as the basis for the assignment of course marks in 
any subject. 

The informal objective examination, constructed in accordance 
with well-recognized principles and incorporating extensive samplings 
of the content actually taught by the teacher and the activities of 
his pupils, is, on the other hand, a suitable basis for the assignment 
of such course marks. It is quite probable that even though two ob- 
jective tests, one standardized and one informal, could be made equal 
in objectivity, length (in terms of number of items as well as testing 
time), and reliability of measurement, their functional values in the 
classroom would still be quite unlike, because of unavoidable dif- 
ferences in their content alone. Thus, in general, standardized and 
informal objective tests must be considered as having quite distinct 
and separate functions, and the terms are not to be used inter- 
changeably. 

Tyler was the leader some twenty years ago in a movement to 
broaden the base for informal objective testing. He pointed out that 
test content had been validated primarily in terms of the informa- 
tional content of the courses tested, and recommended a procedure 
that validated test content in terms of course objectives. Tyler's 
recommendations for procedures to be followed in achievement test 
construction are reproduced below without discussion at this point.' 
Recent enlightened attacks upon construction of both informal ob- 
jective tests and standardized tests have doubtless been influenced 
significantly by this point of view. 


1. Formulation of course objectives. 

2. Definition of each objective in terms of student behavior. 
Collection of situations in which students will reveal presence or 
absence of each objective. 

4. Presentation of situations to students. 

5. Evaluation of student reactions in light of each objective. 
6. Determination of objectivity of evaluation. 

7. Improvement of objectivity, when necessary. 

8 

9 

o 


сә 


. Determination of reliability. 

. Improvement of reliability, when necessary. 

. Development of more practical methods of measurement, when 
necessary. 


1 Ҝаірһ W. Tyler, “A Generalized Technique for Constructing Achievement 
Tests." Educational Research Bulletin, 19:199-208; April 15, 1931. 
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2 ADVANTAGES AND LIMITATIONS OF INFORMAL OBJECTIVE 
TESTS 


The foregoing discussion has pointed out that the standardized 
and the informal objective test are closely similar in the form in 
which the test items are stated. In fact, both types of examinations 
make use of the same general principles in the formulation of their 
content. Both the standardized and the non-standardized tests may 
include enough items to afford consistent measurement. On the other 
hand, there are a few very distinct differences between the essay 
examination and the teacher-made objective examination. In general, 
the advantages of the informal objective test are in the areas in which 
the essay test has definite limitations and perhaps to a less extent the 
weaknesses of the informal objective test are in the areas where 
the essay test is relatively satisfactory. Therefore, the treatment 
below is related to that of Section 3 of Chapter 6 and will in some 
instances depend upon the previous discussion. 

Because of the similarities between the teacher-made objective 
test and the standardized test noted above, the treatment of test’ con- 
struction here applies almost equally well to both forms of objective 
test. Their major differences lie in the purposes for which they are 
constructed and in the uses to which they are properly put, whereas 
the discussion here is based more on the form of the types of tests 
being contrasted. 


Advantages of informal objective tests 


Of the several merits of the informal objective test, the two most 
important are answers of the early objective testers to the two major 
criticisms of the essay examination discussed above—limited sam- 
pling and subjectivity of scoring. ' 

Extensive sampling. Although all tests measure only samples of 
pupil performance, the objective test by its nature samples so widely 
that the results obtained from its use closely approximate those that 
would be obtained if pupil performance in the subject in question 
could be measured completely. A test made up of a hundred or so 
short, well-selected questions or items will adequately sample pupil 
achievement for many purposes. 
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The results from administering a test consisting of many items of 
narrow range are shown in Figure то. This illustration is based on 
the same hypothetical situation as that of Figure 8 on page :43, 
and the same basic conditions ly. The shaded portions of the five 
rectangular areas represent th rtions of the total course content 
mastered by Pupils A, B, C, D, and E. In each instance the shaded 
portion is exactly half of the area of the rectangle. The close-ruled 
horizontal lines represent the 20 short-answer questions, which are 
5о distributed as to cover the content of the entire course. 


50% 55% 50% 55% 50% 
Percentage Scores 


Fig. 10, Effect of extensive sampling on test scores 


It is apparent here, in contrast with the results shown in F igure 
8 when only four questions were used, that the five pupils receive 
scores that are very similar. Since exactly half of the twenty lines 
are opposite the shaded areas for Pupils A, C, and E, they receive 
scores of 5o per cent. As eleven of the twenty marks are opposite the 
shaded areas for Pupils B and D, they receive scores of 55 per cent. 
These results show that the objective test samples widely, and that 
scores resulting from its use are not likely to be much affected by 
differences in the knowledge of different pupils. Enough different 
questions are asked to make sure that the mark made by each pupil 
will place him quite accurately in relation to his classmates in terms 
of his knowledge of course content. This is in direct contrast to the 
findings based on the illustration of Figure 8, which were that wide 
differences occurred in the scores assigned to the five pupils. 
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Objectivity of scoring. In an objective test the items are so stated 
that the answers are brief, and usually only one correct answer is 
possible. A highly objective test may be scored repeatedly by one 
person or it may be scored by a large number of different persons 
with practically no disagreement in the scores assigned. Thus in the~ 
objective examination the responses can be evaluated on an im- 
personal basis, entirely independent of the personal judgment of the 
examiner. This is true, of course, only when the items are constructed 
in accordance with certain recognized principles. These principles 
are listed and discussed in Section 5 of this chapter. 

Economy of time. 'The form in which the objective item is stated 
makes it possible for the pupil to record his response definitely and 
briefly. This in turn permits many specific reactions to be called for 
in a relatively brief period of working time. In this way a much 
wider area of the course content can be sampled in a given period, 
resulting in a higher reliability of measurement per unit of working 
time. 

The conciseness of the pupil’s response makes it possible for the 
scoring of the tests to be done very accurately and speedily. If the 
objective examinations are made in accordance with the best prac- 
tices, they can be scored by simple keys and stencils in the hands of 
ordinary clerical help. Informal objective tests are readily adaptable 
to machine scoring. ; 

Elimination of bluffing. Fluency of expression and mastery of the 
language have always been recognized as factors in examinations of 
the discussion type. Because of the nature of the items, the amount 
of writing done by pupils in responding to objective tests is reduced 
to a minimum, however. This practically eliminates bluffing and the 
advantage that rapid and fluent writers have over those not so gifted. 
The fact that one pupil can write more material than another in the 
same length of time should not result in his receiving higher marks | 
in his school subjects. 


Possible disadvantages of informal objective tests 


A number of rather important criticisms of objective examinations 
have been brought forward by teachers and critics. The following 
list, while not complete, probably contains the more significant of 


these objections. 
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Neglect of training in expression of thought. Teachers sometimes 
feel that the informal objective test inadequately allows opportunity 
for the pupil to organize and express his thoughts. One approach to 
this criticism is through an analysis of how well the essay test fulfills 
these purposes. The amount of such training that is derived from 
writing an essay examination is negligible at best. The time stress 
under which the pupil typically writes his examinations gives him 
very little opportunity carefully to think his way through what he 
actually knows about the subject. He has almost no time to consider 
sentence structure, paragraph organization, or vocabulary choice. 
The net result is that he forms bad rather than good habits of 
thought, expression, and work. 

Some objective methods are available for testing the ability of the 
pupil to organize his thoughts, but no claim should be made that the 
objective test provides opportunity for the verbal expression of or- 
ganized thought. The written examination should be expected to serve 
no such purpose. The opportunity for training the pupils in self- 
expression can and should be provided adequately elsewhere in the 
school program. 

Overemphasis on factual knowledge. This objection to the ob- 
jective examination overlooks the fact that almost uniformly essay 
questions test memory for the factual aspects of the subject. The 
thought question as a type is not at all inherent in the essay ex- 
amination. Furthermore, there is nothing in the objective form which 
makes impossible the construction of items that stimulate critical 
and constructive thought. Many teacher-made tests do not contain 
such thought-provoking items, but that does not mean that they can- 
not be made to do so when teachers become masters of objective 
techniques and learn to think deeply enough into the validation of 
their tests. The informal objective examination can be used, as is 
brought out later in this chapter and in subsequent chapters of this 
volume, in the measurement of various instructional outcomes of 
significance far beyond the acquisition of facts and of basic skills. 
It is probable that one source of this criticism lies not so much in 
objective methods of measurement in general as it does in the kinds 
of objective material typically prepared by the individual teacher. 

Encouragement of guessing. Some teachers and critics believe that 
there is a tendency for the objective test to encourage guessing to 
an undue extent. The objective examination form admittedly per- 
mits, but does not necessarily encourage, guessing. In fact, it may 
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tend to discourage guessing through its emphasis upon exact knowl- 
edges and correct applications and interpretations of factual data, 
and in its use of correction for guessing formulae in scoring test 
results. Furthermore, it is probable that few guesses on objective 
tests are based on pure chance. Rather they are based on slight bal- 
ances of evidence on one side or the other of an issue on which the 
pupil is uncertain. Many life activities, as a matter fact, are based 
on chances considerably less than certain of a given outcome. There- 
fore, it seems that guessing in the sense of weighing available evi- 
dence and making the best decision possible is neither injurious to 
the pupil nor a bad influence upon examination results. 

Difficulty of preparation. The criticism that informal objective 
tests are difficult to prepare is frequently made. The typical essay 
test is easy to prepare but difficult to score. The informal objective 
test may be difficult to prepare but it is certainly easy to score. When 
the advantages accruing to the use of objective tests are balanced 
against the difficulty of preparing them, the conclusion seems favor- 
able rather than otherwise to the objective test. 

Considerable cost. Experience in the use of objective examinations 
indicates that they are most valuable when available for classroom 
use in printed or mimeographed form. Unquestionably the paper 
cost is an item of expense which in some school systems may be 
serious. However, some kind of paper must be used for the examina- 
tion. Mimeograph paper is approximately as cheap as any. If the 
teacher is willing to do his own mimeographing or hektographing, 
the extra expense should not be very great. As a matter of actual 
fact, the cost of preparing objective examinations probably represents 
one of the very minor items of expense in the average school system 
when it is considered in terms of the real educational importance 


of such equipment. 


3 CONSTRUCTION AND USE OF INFORMAL OBJECTIVE TESTS 


The problems of constructing and using informal objective tests 
are discussed in this and the following section of this chapter. Treated 
here are the general issues that should receive consideration from 
the time a test is in the planning stage to the time when its results 
have been used finally in the validation of its individual items. The 
following section deals with'the various major objective item forms 
somewhat, in. detail:and. presents samples of the various item types. 
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Section 5 gives general and specific suggestions for drafting items 
of the five basic types. 


Types of instructional outcomes 


A classification of instructional outcomes by major types is useful 
in obtaining an optimum balance of test content and in making sure 
that certain types of outcomes are not overstressed at the expense of 
other and perhaps even more fundamentally important outcomes. 
The types of outcomes ? which may be distinguished are: (1) skills, 
(2) knowledges, (3) concepts, (4) understandings, (5) applications, 
(6) tastes and preferences, and (7) adjustment. 

Tastes and preferences are best represented by attitudes, interests, 
and appreciations. The individual's feelings and emotions are much 
more involved in formulating his tastes and preferences than аге the 
more largely intellectual processes operative in determining his 
knowledges, concepts, and understandings. Whether adjustment 
should be considered as a type of instructional outcome or the re- 
sultant of all of the other types of outcomes is not certain. In any 
event, since attitudes, interests, and adjustment are ordinarily con- 
sidered to fall in the area of personality measurement, they are dis- 
cussed in Chapter тт. Appreciations, not directly measurable by the 
usual paper-and-pencil achievement tests but rather subject to ap- 
praisal by the use of more complex evaluative tools апа techniques, 
are dealt with in Chapter 9. 

The five remaining types of outcomes—skills, knowledges, con- 
cepts, understandings, and applications—are briefly characterized 
below. It seems certain that care in attaining an optimum balance 
among these types of outcomes in test construction will result in 
more valid informal objective tests than might otherwise be attained. 
Tt should be borne in mind, however, that subject areas and even 
specific subjects differ widely in their objectives and hence in the 
behavioral outcomes to be expected of pupils. For example, such 
tool subjects as mathematics, the expressive language arts, and the 
practical arts appropriately emphasize skill outcomes more than do 
such content subjects as the social studies and the sciences. Knowl- 


2 Adapted from Asahel D. Woodruff, The Psychology of Teaching, Third edition. 
Longmans, Green and Co., New York, 1951. Chapter 16. See also I. №. Thut and 
J. Raymond Gerberich, Foundations of Method for Secondary Schools. McGraw- 
Hill Book Co., Inc., New York, 1949. p. 107-12. 
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edges, concepts, and understandings appropriately receive somewhat 
more emphasis in the content subjects than in the skill areas. The 
consequence of such differences is that an optimum balance among 
these types of outcomes in test construction is attainable only by 
careful consideration of the appropriate objectives and hence of the 
appropriate instructional outcomes separately for each course and 
in the usual case individually by each teacher. 

Skills, Physical or motor activity is definitely involved in many 
of the types of behavior considered under this heading, although this 
does not preclude the operation of mental processes in skill behavior. 
Reading skills, work-study skills, language skills, computational 
skills, shop and laboratory skills, typing skills, and athletic skills 
are representative of the variety and of the scope covered by this 
type of outcome. 

Knowledges. Attainment of knowledge involves the establishment 
of such mental associations as those between an object and its name, 
а date and an event, a term and the color or characteristic it repre- 
sents, and a symbol and its meaning. Outcomes of this type are rep- 
resented by knowledges concerning facts, principles, and laws, 
knowledges concerning processes and procedures, and knowledges 
concerning sources of information. 

Concepts. Concepts presuppose that meaning has been attached to 
what has been learned, whereas purely knowledge outcomes may be, 
but are not necessarily, organized at the conceptual level. Abilities 
to give the meanings of words, to discriminate types or qualities of 
color, and to use abstract words in thinking, speaking, and writing 
demonstrate the attainment of concepts. The emphasis in modern 
Schools on the development of meanings represents the attempt to 
develop this type of instructional outcome. 

Understandings. Knowledge alone, embodied in the psychologically 
unsound truism that *knowledge is power," represents a much lower 
and less functional instructional outcome than that referred to in 
the psychologically sound statement that "understanding passeth 
knowledge.” * Knowledge without power, or without understanding 
of its significance, is useless. Understandings are probably similar to 
but at a higher level than concepts. Understandings even more than 


8 Harl R. Douglass and Herbert F. Spitzer, “The Importance of Teaching for 
Understanding.” The Measurement of Understanding, Forty-fifth Yearbook of the 
National Society for the Study of Education, Part I. University of Chicago Press, 


Chicago, 1946. p. 7. 
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concepts appear to be essential prerequisites to the functional use 
of what has been learned. Modern schools are increasingly stressing 
the development of understandings in pupils. 

Applications. Abilities of pupils to apply the results of learning 
in logical thinking and in solving problems represent an end product 
or an ultimate goal of teaching. Skills, knowledges, concepts, and 
understandings all contribute to the attainment of this outcome. The 
development of realistic tastes and preferences is also prerequisite to 
the effective use of what has been learned in a functional situation. 
Logical thinking and problem-solving are not limited to mathematics, 
where the terms have perhaps most often been applied, but extend 
to any area, whether social, economic, political, or scientific, in which 
problems exist and decisions are to be made. 


Content of the informal objective test 


It is highly important that the test be definitely based upon the 
objectives and outcomes of the course, and also upon the course 
content. It is true, naturally, that content is basic to a test, and 
furthermore that the best source of content material is found in the 
course itself. However, the measurement of factual knowledges, and 
the assumption that the pupil is necessarily able to use the knowl- 
edges he has acquired or been modified by, are unsound. Tyler found 
that knowledge of facts and ability to apply principles to new 
situations are related only to the degree shown by an average cor- 
relation coefficient of not much above .25 in science courses at Ohio 
State University. Therefore, not only should the test be so con- 
structed as to measure the degree of attainment of the pupils in the 
desired outcomes but it should do so by means of test situations 
that involve the ability to apply and use facts as well as knowledge 
of facts. 

Care should be taken to sample course content widely and im- 
partially in the selection of materials for a test. It is also ordinarily 
desirable to use more than one type of objective item in the test, but, 
on the other hand, not to use too great a variety of item types. For 
ordinary classroom tests given during one period, two or three types 
might be used; for longer examinations, variety might be increased 


* Ralph W. Tyler, “Identification and Definition of the Objectives To Be Meas- 
ured." The Construction апа Use of Achievement Examinations. Houghton Mifflin 
Co., Boston, 1936. p. 7. 
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by using four or five types or modifications. It should be kept in 
mind that the subject matter itself is often a factor limiting the types 
of items used. Since recall items place a greater demand upon the 
pupil's memory of specific facts than.is true of recognition items, it 
might well be expected that the pupil would recognize the accuracy 
of certain facts presented to him but not necessarily be able to 
recall the facts without clues. Therefore, recall items should be used 
only for important facts. 

'The test maker usually finds it advantageous first to construct 
items that fall into large groupings, such as matching exercises, and 
then to construct items having narrower scope. It is also desirable to 
construct multiple-choice items prior to alternate-response forms. 
This does not mean that all matching and multiple-choice items 
should be constructed before any true-false or simple recall items are 
made, but rather that first consideration should be given for a certain 
fact or relationship to the possibility of its use in an item form which 
is not so flexible and widely applicable as are the true-false and simple 
recall. If a particular idea does not, for example, readily combine 
with other similar relationships into a matching exercise and does 
not furnish enough plausible alternative responses for use in multiple- 
choice form, it might immediately be set up in one of the simpler 
forms. 

The teacher will find it advantageous, for reasons that will be 
brought out clearly below, to write each item or each test unit on a 
filing card or slip. Alternate-response, multiple-choice, and simple 
recall items should be put on separate cards. Paragraph completion 
and matching exercises should be written on cards in their entirety, 
for such test units cannot be broken down by items for listing on 
separate cards. It is possible and desirable to code these cards in 
terms of the content they cover and also to keep records of the use 
of each item in the test and its validity. More will be said of these 
last two points in a later section of this chapter. 


Assembling and preparing the informal objective test 


After the test items have been constructed, they should be sorted 
by types and carefully evaluated in their new settings. There should 
be a minimum number of items which all pupils can answer correctly 
or for which no pupils can get the correct answers. A difficulty level 
averaging about 50 per cent is recommended by Lindquist as most 
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satisfactory. Items should therefore range from that point toward 
very hard and toward very easy. If there should be too few items 
of a certain type for a section of a test, those items should be re- 
drafted to fit into one of the sections definitely decided upon. 

Test length depends on many factors other than the nature of the 
test items and the amount of time available for testing, but these are 
basic issues to be considered in the preparation of a test. The test 
should be of such length that all or very nearly all of the pupils can 
complete it before the end of the testing period. Recommendations 
have been made concerning the number of items of each type that 
can be given per unit of time at various age levels. From the avail- 
able evidence it is impossible to determine in advance the exact 
working time required for a given form of objective examination. 
However, a reasonable estimate may be reached by allowing one 
minute of working time for each two recall items, each two multiple- 
choice items and each three true-false items. Such recommendations 
seem to have only very general significance, however, for the diffi- 
culty of the items and the age level of the pupils have much to do 
with time requirements, and teachers-yary a-great.deal in the types 
of items they construct. The teacher will learn after brief ex- 
perimentation how long a test should be for a given period of time. 
The number of items can be determined automatically by the num- 
ber that have been constructed when the teacher considers the test 
to be complete and adequate and of proper length for the testing 
period. It is, however, important that a fairly large number of items 
be used in all objective tests. 

Items should be arranged in parts or sections according to type 
in the final test. There is little agreement among test workers con- 
cerning the best arrangement of items for informal objective tests. 
Some prefer arrangement of items in each part by an increasing 
order of difficulty. If this method is used, the teacher’s judgment 
concerning item difficulty is the only basis for arrangement when 
items are first used. Item-counting procedures furnish evidence on 
difficulty after items have been used with a class. Other persons 
prefer to arrange the items topically within each section of the test, 
and to consider item difficulty in the arrangement of items only by 
introducing the test by a few very easy items so that pupils will 


5 E, F. Lindquist, “The Theory of Test Construction." The Construction and 
Use of Achievement Examinations. Houghton Mifflin Co., Boston, 1936. p. 32-33- 
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not become discouraged before they get well started. The authors 
believe that either organization of the test is satisfactory and that 
the individual teacher should use the method which adequately meets 
the conditions under which he uses the informal objective examina- 
tion. 

The examination should be prepared for use with the pupils by a 
mimeographing or other method of reproduction if possible. Some 
item types can be given orally if absolutely necessary, and the black- 
board can be used for short quizzes. Complete directions to the 
pupils should always be provided. This sometimes entails general 
instructions at the beginning and separate directions for each part 
of the test. If the item forms are difficult to understand or if pupils 
are taking objective tests for the first time, samples showing how 
they are to record their answers should be given with the directions. 
'The samples should be so simple in content that they will be readily 
comprehended by all pupils. Illustrations of directions to pupils and 
of samples to demonstrate methods of answering test items are given 
later in this chapter. 

Pupils should be told in the directions whether or not to guess, and 
should also be told how the test will be scored. The most common 
procedures and those usually recommended are to instruct the pupils 
not to guess and then to correct their scores for guessing on alternate- 
response items. On the other hand, pupils are usually told to attempt 
each item on the matching test. 


Administering and scoring the informal objective test 


Little need be said here concerning the administration of the in- 
formal objective test except to point out that if the directions to 
pupils and any necessary sample items are carefully and well pre- 
pared the actual administration of the test is simple indeed. The 
teacher should be careful not to give intentional or unintentional 
assistance to individual pupils by answering any questions they may 
ask. The safest procedure is to make certain that the pupils under- 
stand how to take the test by careful preparation of the directions, to 
make sure that individual test items require no explanations by 
framing them with care, and then to answer no questions about word 
meanings or interpretations to be placed on certain items while the 
test is in progress. Pupil questions concerning typographical errors 
they may encounter in the test should be investigated and the at- 
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tention of the entire class should be called to any such errors that 
might within reason cause misinterpretations of items. 

Scoring of the test should be by the predetermined method, and 
should vary with the type of objective item. Scoring keys can be 
prepared easily by using a copy of the test and cutting it into strip 
keys and cutout stencils as required. With such keys available, the 
actual mechanics of scoring the tests аге very simple. Each correct 
answer should ordinarily be given one point of credit. It will be 
advantageous to mark each correct answer with a colored pencil for 
later use for instructional purposes. 

Chances of guessing the correct answers vary with different item 
forms. There is little if any chance of guessing, or at least of making 
a pure guess, on recall item forms. Obviously, the chance is 50-50 on 
an alternate-response item, but it is only one in five for a multiple- 
choice item with five alternatives. The correction for chance formula 
is 


Score — Rights — е ог К— n 


—1I Nest 
where N represents the number of possible answers to an item. 
For the true-false item, this becomes R — W. For multiple-choice 
items of 3, 4, and 5 alternatives, the formula becomes respectively 


po aE era ни 


2 3 4 


Correction for chance is ordinarily used with the true-false test and 
the multiple-choice test consisting of items that have as few as three 
alternatives. It need not necessarily be used with multiple-choice 
items having four or more alternatives, as the chance of making a 
correct guess is not great in such tests. Matching tests are not cor- 
rected for chance, for little opportunity for guessing exists if they are 
properly constructed. 

There should be no attempt to weight individual items of a test 
differently according to their importance or difficulty. A summary 
of various studies dealing with this question leads to that conclusion.? 


6 J. Murray Lee and Percival M. Symonds, *New-Type or Objective Tests: A 
Summary of Recent Investigations (October 1931-October 1933)." Journal of 
Educational Psychology, 25:161-84; March 1934. 
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It may be desirable in some instances, however, to assign varying 
weights to the scores resulting from different parts of the test in 
order to account for differences in difficulty or average time required 
per item, in which case the most satisfactory procedure is probably 
to multiply by 2 or by 3 the scores from test parts that are thought 
to be deserving of extra weighting. 


Anticipating future testing needs 


For the teacher who repeats courses annually or more than once 
each year, concern with a particular informal objective test should 
not end with the final direct use of the results. Informal objective 
testing is not economical of teacher time if the teacher starts afresh 
in the construction of every test over a period of years. Construction 
of informal objective tests should be a cumulative and selective 
process resulting in constant improvement of the tests actually used 
in the classroom. If tests are to be evaluated and improved in the 
manner suggested below, test booklets should not be returned to the 
pupils permanently. However, they may well be distributed for 
review purposes after the test has been scored, and collected when 
the instructional purpose has been accomplished, or used with in- 
dividual pupils in conferences concerning special points needing 
further emphasis in their work. 

As a means of determining the validities of individual items for 
future use, the teacher will find the method generally known as item- 
counting of great value. Onè of the simple item-counting methods is 
based on a division of the class into groups of above-average and 
below-average performance on the test, with about half of the 
class in each group. The test papers should then be sorted into 
corresponding groups. The number of correct responses to each test 
item by the pupils in each group can then be determined by a routine 
clerical procedure. This ordinarily involves the use of squared paper 
on which the columns represent the items of the test and the rows 
are used for checking the items correctly answered by each pupil. 
A summation of the check marks in each column for. each of the two 
pupil groups is then made. When the number of correct responses to 
each item is converted into a percentage of the number of pupils 
in the group, data essentially of the type shown in Table 2 of 
Chapter 5 become available. 
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Such evidence is valuable to the teacher in determining which 
test items properly discriminate pupil abilities by showing higher 
percentages of correct answers for above-average than for below- 
average pupils, which ones might be suspected of ambiguity or other 
faults because of failure to effect such discrimination, and which, 
if any, show reversals of the desired type of discriminative power. 
If the information concerning item validities thus obtained is re- 
corded on the cards which it was suggested in an above section should 
be set up for test items and groups of items, the cards become a valu- 
able file for use in the construction of future tests. Items that show 
the proper type of discrimination can be retained, and those that 
discriminate in the wrong direction can be discarded or revised after 
critical examination reveals the source of their ambiguity or other 
weakness. Ultimately, the card file should include only test items 
that have been found satisfactory in actual classroom measure- 
ment. 

A card file of this type can be used for the construction of new 
tests when the occasion arises, with assurance that the ambiguous 
items occurring in previous tests have largely been eliminated. It is, 
of course, desirable to add to the file as course content changes and 
to withdraw items which, although valid, are no longer applicable 
because of changing course content and objectives. Need for such 
constant turnover is greater in the social studies and sciences, in 
which current developments perhaps have the greatest immediate 
influence, than in subjects for which the content changes less rapidly, 
but it is undesirable for any course that objective classroom testing 
be allowed to become static. 

Although this procedure for validating test content may on the 
surface appear to be lengthy and somewhat involved, the teacher 
will realize significant dividends in improved pupil measurement by 
the use of it or some similar procedure. After such a system of keeping 
a cumulative test item file is once established, the teacher will realize 
the great saving in time and the increased testing efficiency that 
results. Time expenditure by the teacher is greatest for the typical 
essay test in the scoring of pupil results. Time expenditure by the 
teacher is greatest for the informal objective test in its preparation. 
Attention to the construction of good tests seems much more de- 
fensible than attention to the scoring of tests which in many instances 
are not satisfactory measurement instruments. 
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4 TYPES OF OBJECTIVE ITEMS 


The uses and limitations of the five basic types of objective items 
are discussed below and a few sample items * are given to illustrate 
each type. 


Simple recall items 


Simple recall test items cannot be definitely distinguished from 
completion exercises. The major distinctions appear to rest on com- 
plexity and length of the test unit and perhaps on the number of 
pupil responses called for. The simple recall form is by far the most 
widely used of the recall item types. It usually involves a very brief 
response by the pupil, such as writing a word, a number, a symbol, 
or a short phrase in a designated place in answer to a question or to 
complete a statement. 

Uses and limitations of simple recall items. The simple recall item 
is best adapted to the measurement of rather highly factual knowl- 
edges of the who, what, when, where types, and is very widely 
adaptable to different subject matter in such uses. It can be used to 
test the ability to identify things described or pictured, in which 
form it has rather wide range. In identification exercises, it is perhaps 
best adapted for use with maps and charts in the social studies and 
representations of biological structures in the natural sciences. It is 
useful in computational problem situations in arithmetic and the 
physical sciences. 

One of the major characteristics of the simple recall form is its 
apparent ease of construction, which tends to encourage wider use 
than is perhaps justified. Because of its tendency to measure factual 
knowledges rather than understandings, there is danger of over- 
weighting tests with factual materials if the simple recall item is too 
widely employed. This item is not readily adaptable to the measure- 
ment of abilities to apply facts, to perceive complex relationships, 
and to draw logical inferences. The simple recall form is readily 
understood by pupils because of its similarity to the essay question. 


* For extensive samples of major item types and their modifications classified 
by instructional outcomes and item types, see J. Raymond Gerberich, A Guide to 
Achievement Test Construction: Specimen Objective Item Types. Longmans, Green 
and Co, New York, 1954. 
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The simple recall item is difficult to score because of the tendency 
for the responses to lack complete objectivity, even though responses 
may be provided for in terminal and aligned form. It is further limited 
by the fact that it is not directly adaptable to machine methods of 
scoring. 

Major types of simple recall items. 'The simple recall item is 
perhaps most frequently presented in the form of a declarative state- 
ment with a blank in which the pupil is to write the correct com- 
pletion occurring at the end of the sentence. It also is frequently 
used, particularly in the lower grades, in the form of a question to be 
answered by the pupil on the line immediately following. Another 
form less widely used but satisfactory involves a list of terms or 
statements introduced by directions which tell the pupil to write on 
the line following each the other term or statement called for by 
the directions. 


Excerpt from National Achievement American History Test 


READ THE FOLLOWING SAMPLE: PRACTICE EXERCISE: 
The inventor of wireless was Marconi. ‘The first President of the United States 


was 
In this SAMPLE, the name “Marconi” was written on the line to finish the sentence. 
DIRECTIONS: Finish every sentence by putting the correct name on the line. 


1. The Rough Riders were under 4. The builder of the Panama 

the leadership of a man named———________ Canal was Мајог-бепегаі 
2. A resper was invented in 1834 5. Many railroads were developed 

amed Orit ш 

ГДР anaa һу а шап пашей James J.. 
3. In th ац f 1896, th: 

«дне. ойша. silver ae 6. The North Pole was discovered 

favored by тїшїн с —— by Robert E.. 


Completion items 


Completion items may be either of the sentence or the paragraph 
type. Frequently there is little by which a sentence completion item 
can be distinguished from the simple recall item. The more typical 
form of the completion exercise, however, is that based on a para- 
graph of unified material in which several blanks are provided for 
the pupil to fill with the words, numbers, or short phrases that cor- 
rectly complete the meaning. Since blanks in the completion exercise 


8 Robert К. Speer, Lester D. Crow, and Samuel Smith, National Achievement 
Tests: American History, Grades 7 and 8. Published by Acorn Publishing Co., 1939. 
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only occasionally occur at the ends of sentences, pupil responses 
typically are scattered over the page. An adaptation of this type of 
exercise places a number in each blank and similarly numbered 
blanks at the right-hand margin for use by the pupils in recording 
their answers. This results in simplifying the scoring procedure. for 
completion exercises. 

Uses and limitations of completion items. Similarities between the 
simple recall item and the sentence and paragraph completion exer- 
cise result in considerable similarity of their uses and limitations. 
Both are typically rather highly factual, but the latter requires the 
pupil to handle a larger unit of thought and to integrate his ideas 
more fully. Both are difficult to score objectively, and must be so 
constructed that the blanks call for definite responses. Neither can 
be scored directly by mechanical methods. Both may become puzzle 
situations for the pupil if too much of the thought is omitted from 
the statement to permit of reasonably quick comprehension of mean- 
ing by the pupil. The completion exercise is somewhat harder to score 
than the simple recall item unless a device that results in aligned 
and marginal responses is employed. 

Completion examples are not so widely adaptable as simple recall 
items because of the need for broader and more unified thought units 
in the former. However, both forms are useful with a wide variety 
of content. The completion sentence is applicable, for example, in 
situations involving use of the correct language form in a given 
setting in English or the foreign languages, in completing arithmetical 
examples of the equation form, and in a variety of situations in the 
social studies and sciences. The paragraph completion exercise is 
useful in various courses for situations in which a chronological, or- 
ganizational, sequential, or cause and effect type of pattern exists, as, 
for example, with the processes involved in a complete cycle of blood 
circulation in the human body. 

Major types of completion items. Sentence completion exercises 
frequently require the filling of two or more blanks by the pupil and 
the blanks do not, of course, occur at the ends of the sentences, as 
they typically do in simple recall items. The paragraph completion 
exercise differs from the sentence completion mainly by consisting of 
a longer and perhaps more complex thought unit, probably by re- 
quiring more pupil responses, and by consisting of two or more 
sentences in a well-unified paragraph. 
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Directions. In each paragraph a blank line means that a word has been 
left out. Read each paragraph. Then think of the word that should be in 
each blank. Write the word in the parentheses at the side of the page. You 
should get the answer from the paragraph itself. 


SAMPLE. Dick, Tom, and Fred are brothers, The names — — а 
of Dick's brothers are () апі 00. . ..( )» 


24-27. Many bad automobile accidents happen as а 
result of drivers' going to sleep at the wheel. If a 
driver feels, &» , he should consider it a danger signal. — 
There are many causes of _25)_. Sometimes the hum ) 25 
of the motor or the rapid passing of objects makes a^ 7 
driver sleepy. Some drivers get drowsy at certain 
times of day. Loss of sleep causes drowsiness, too. 

Some drivers pull off the _@®_, stop the | €» , and take 
а nap when they are sleepy. stea iis rondi rne ( ) 27 


) 2 


iss 


Alternate-response items 


Alternate-response items are those in which only two alternatives 
are presented to the pupil for his response. The simplest and most 
common forms of alternate-response items are the true-false, re- 
quiring an answer concerning the truth or falsity of a statement, and 
the yes-no, requiring one of those answers to a question. Another form 
involves the selection of the correct one or better one of two al- 
ternatives that are presented as possible completions in a given 
setting. 

The true-false, as the most widely used alternate-response type, 
has doubtless been the most popular form of recognition item and 
probably remains so today for classroom testing purposes. It typi- 
cally involves a very simple method of response by the pupil in 
aligned answer positions at either the left or right side of the test 
paper. 

Uses and limitations of alternate-response items. The true-false 
item is widely applicable in all subject fields. Its ease of construction 
has resulted in greater popularity and wider use than have been 
attained by any other item form. However, its ease of construction is 
frequently delusive, for the elimination of ambiguities from the true- 


? Richard D. Allen and others, Metropolitan Achievement Tests, Test 1, Reading, 
Advanced Battery. Published by World Book Co., 1946. 
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false item is sometimes difficult to accomplish. Since this weakness 
seems to be inherent in the item itself, test technicians are tending to 
use it less and less. It and the simple recall item are perhaps most 
frequently taken almost verbatim from textbooks, and consequently 
in such cases a premium is placed upon photographic memory for 
facts by pupils. 

Alternate-response item forms have the advantage of affording 
coverage of many individual items in a short period of time, since 
the time requirements are less than for most item types. On the other 
hand, guessing is more of a problem for this than for any other item 
type, for which reason little diagnostic value can be obtained by 
using an item-count method of analyzing the results for a group of 
pupils or an individual pupil. Alternate-response items are highly 
objective in scoring, and are readily understood by pupils. This item 
type is readily scorable by mechanical methods in all of its common 
varieties. 

True-false items can be used satisfactorily in many situations if 
they are constructed carefully enough to keep them free from am- 
biguity. They are especially useful for situations in which the ab- 
sence of enough plausible alternative responses makes the use of a 
multiple-choice item impracticable. 

The type of alternate-response form that requires the pupil to 
select the one of the two alternatives that correctly fills a particular 
need is very widely useful for measurement of a functional type of 
instructional outcome in English and the foreign languages. Tt could 
be used in a wide variety of situations, but in practice this item form 
has been limited largely to language usage testing. 

Major types of alternate-response items. The most common form 
of true-false item may be set up so that the pupil will respond by 
encircling or underlining a T or F, or a True or False. The arrange- 
ment of answer spaces in columns under T and F in which the answer 
is indicated by an “X” or check mark has the added advantages of 
speed of response and ease of scoring. 

Another common form is presented as a question, the pupil's 
responses usually consisting of encircling or underlining either Yes 
or No. This form, which differs little from that presented above, is 
preferable for use with young children because the situation presented 
is a very normal one. An alternate-response form commonly used in 
English and foreign language tests involves the selection of the proper 
one of two given word forms for use in a certain setting and indica- 


180 THE SECONDARY SCHOOL 
Excerpt from Metropolitan Reading Test ° 


Directions. In each paragraph a blank line means that a word has been 
left out. Read each paragraph. Then think of the word that should be in 
each blank. Write the word in the parentheses at the side of the page. You 
should get the answer from the paragraph itself. 


SAMPLE. Dick, Тот, and Fred are brothers, The names(——________) а 
of Dick's brothers аге _(а)_ апі (0. . vet )o 


24-27. Many bad automobile accidents happen as a 
result of drivers’ going to sleep at the wheel. If a 
driver feels. 9 , he should consider it a danger signal. — 
There are many causes of _(25)_. Sometimes the hum, ) ss 
of the motor or the rapid passing of objects makes а 77 
driver sleepy. Some drivers get drowsy at certain 


) 24 


times of day. Loss of sleep causes drowsiness, too. Xs 
Some drivers pull off the  ?9 , stop the  ?» , and take 
а nap when they are sleepy. seisonnan e Aa ( ) 27 


Alternate-response items 


Alternate-response items are those in which only two alternatives 
are presented to the pupil for his response. The simplest and most 
common forms of alternate-response items are the true-false, re- 
quiring an answer concerning the truth or falsity of a statement, and 
the yes-no, requiring one of those answers to a question. Another form 
involves the selection of the correct one or better one of two al- 
ternatives that are presented as possible completions in a given 
setting. 

The true-false, as the most widely used alternate-response type, 
has doubtless been the most popular form of recognition item and 
probably remains so today for classroom testing purposes. It typi- 
cally involves a very simple method of response by the pupil in 
aligned answer positions at either the left or right side of the test 
paper. 

Uses and limitations of alternate-response items. The true-false 
item is widely applicable in all subject fields. Its ease of construction 
has resulted in greater popularity and wider use than have been 
attained by any other item form. However, its ease of construction is 
frequently delusive, for the elimination of ambiguities from the true- 


9 Richard D. Allen and others, Metropolitan Achievement Tests, Test т, Reading, 
Advanced Battery. Published by World Book Co., 1946. 
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false item is sometimes difficult to accomplish. Since this weakness 
seems to be inherent in the item itself, test technicians are tending to 
use it less and less. It and the simple recall item are perhaps most 
frequently taken almost verbatim from textbooks, and consequently 
in such cases a premium is placed upon photographic memory for 
facts by pupils. 

Alternate-response item forms have the advantage of affording 
coverage of many individual items in a short period of time, since 
the time requirements are less than for most item types. On the other 
hand, guessing is more of a problem for this than for any other item 
type, for which reason little diagnostic value can be obtained by 
using an item-count method of analyzing the results for a group of 
pupils or an individual pupil. Alternate-response items are highly 
objective in scoring, and are readily understood by pupils. This item 
type is readily scorable by mechanical methods in all of its common 
varieties. 

True-false items can be used satisfactorily in many situations if 
they are constructed carefully enough to keep them free from am- 
biguity. They are especially useful for situations in which the ab- 
sence of enough plausible alternative responses makes the use of a 
multiple-choice item impracticable. 

The type of alternate-response form that requires the pupil to 
select the one of the two alternatives that correctly fills a particular 
need is very widely useful for measurement of a functional type of 
instructional outcome in English and the foreign languages. It could 
be used in a wide variety of situations, but in practice this item form 
has been limited largely to language usage testing. 

Major types of alternate-response items. The most common form 
of true-false item may be set up so that the pupil will respond by 
encircling or underlining a Т or F, or a True or False. The arrange- 
ment of answer spaces in columns under T and F in which the answer 
is indicated by an “Х” or check mark has the added advantages of 
speed of response and ease of scoring. 

Another common form is presented as a question, the pupil’s 
responses usually consisting of encircling or underlining either Yes 
or No. This form, which differs little from that presented above, is 
preferable for use with young children because the situation presented 
is a very normal one. An alternate-response form commonly used in 
English and foreign language tests involves the selection of the proper 


one of two given word forms for use in a certain setting and indica- 


182 THE SECONDARY SCHOOL 


tion of the one selected by crossing out the incorrect word form or 
marking the correct word form. 


Excerpt from Progessive Tests in Related Sciences 1° 


DIRECTIONS: Read each statement below. If 
the statement is TRUE, you are to mark the 
letter T; if it is FALSE, mark the letter F. 


41. Water power is used to produce 
electrical power. T 


42. Most of our paper is made from 
wood. 


43. The dog was one of the first ani- 
mals to be tamed by man. T 


Multiple-choice items 


Multiple-choice items have come to be the most popular form for 
standardized testing of recent years, and are increasingly coming 
into wide use for informal objective testing as well. A recognition 
item type, the multiple-choice item commonly consists of an incom- 
plete statement followed by from three to five responses that will 
complete the statement with varying degrees of accuracy. The pupil 
is expected to choose the response that correctly or best completes 
the statement, and typically to indicate his choice by an answer 
appearing in a column at the left or the right side of the test paper. 

This item type may be in question rather than in statement form 
or may consist of three to five words, symbols, or numbers from which 
the correct one is to be chosen by the pupil. It may request the best 
of several correct or partially-correct answers on a given point. It 
may even require responses for the two or more correct answers 
among those furnished, in which case it becomes a multiple-response 
item. 

Uses and limitations of multiple-choice items. The multiple-choice 
and its numerous variants perhaps represent the most valuable and 


10 Georgia S. Adams and John A. Sexson, Progressive Tests in Social and Related 
Sciences, Test 6, Elementary Science, Elementary Battery. Published by California 
Test Bureau, 1946. 
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at the same time the most widely applicable type of objective test 
item. It is readily, although not necessarily easily, adaptable to the 
measurement of discriminative power, inferential reasoning, inter- 
pretive ability, reasoned understanding, generalizing ability, and 
other types of outcomes deriving from the pupil's ability to apply 
and use facts. It is not difficult for pupils to understand and use. It is 
highly objective, and can be readily scored either by hand or by 
machine. Item-counting procedures based on the results for an indi- 
vidual pupil or a class have considerable diagnostic and analytic 
significance. 

Multiple-choice and multiple-response items in their variety of 
forms are so widely adaptable to different types of content that the 
preceding discussion should make the fact evident without illustra- 
tion. As is the case for the true-false item, there is probably no field 
of learning to which the multiple-choice item is not widely applicable. 
However, the necessity for finding at least two and in many cases 
as many as four plausible responses to go with the correct completion 
somewhat limits the applicability of the item form within each sub- 
ject field. Ingenuity on the part of the test maker and the results of 
practice in item construction make the item type very widely ap- 
plicable to the content of various instructional areas, however. 
Multiple-choice items are not as easily constructed as are some other 
objective test forms, for there are various technical problems that 
require great care in the drafting of items. The incorrect answers 
pupils give to simple recall items often serve as excellent incorrect 
alternatives if the item is converted to multiple-choice form. 

Major types of multiple-choice items. The basic and probably 
most common multiple-choice form is that in which the correct or 
best completion is to be selected by the pupil from the three to five 
that are furnished for an incomplete declarative sentence or in answer 
to a question. 

A common use of multiple-choice forms is in testing various types 
of reading ability, as, for example, ability to comprehend the mean- 
ing of a paragraph, by basing a single item or several items on a 
passage of reading material in English or a foreign language. Some- 
what similarly, multiple-choice items can singly or by groups be 
based on a map, chart, diagram, or table, and require the pupil to 
interpret the data presented as a basis for answering. 

Another variation, called the multiple-response, is that in which 
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the pupil is asked to select all of the correct completions from the 
three to five typically given. There may be only one or as many as 
several correct answers to each item when this form is used. Each 


Excerpt from Cooperative Social Studies Test ** 


3. The period of change from hand-made to 
machine-made goods is known as the 
3-1 Industrial Revolution. 
3-2 Handicraft Age. 
3-3 Age of Big Business. 
3-4 Reformation. 
3—5, "Renaissance, somara енен nie ) 


4. Which of these was not one of the original 
thirteen states? 
4-1 Virginia. 
42 Georgia. 
4-3 Massachusetts. 
44 Florida. 
4-5 New Vork. ormie 5004 4( ) 


correct response is ordinarily assigned one scoring point of credit. 
The fact that not only the choice but also the response is plural 
accounts for the distinction in names between this and the more 
common multiple-choice item. 


Excerpt from Read General Science Test :° 


28. The geological formation above constitutes evidence of — 


6. volcanic action. 

T. erosion. 

8. folding. 

9. sedimentation in a running stream. 
10. movement in the earth's crust. 


11 Harry D. Berg and Elaine Forsyth, Cooperative Social Studies Test for Grades 
7, 8, and 9, Form X. Published by Cooperative*Test Service, 1947. 

12 John G. Read, Read General Science Test, Form A. Published by World Book 
Co., 1950. 
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Matching exercises 


Matching exercises are in effect combinations of multiple-choice 
items in such a manner that the choices are compound in number. 
Matching exercises differ from all of the objective forms treated 
previously in the fact that they must occur in groups. There is really 
no such thing as a matching test item, unless a correct pairing pulled 
from a group of which it is a part might be so designated. Matching 
tests are by nature, then, multiple in type, and the number of scoring 
points is ordinarily determined by the number of responses required 
of the pupil. 

A matching exercise or set usually consists of two lists of related 
facts between which a constant type of relationship exists throughout. 
The pupil's responses are expected so to pair items in the two lists 
as to indicate their proper relationships. Variations involve un- 
balanced sets, in which more items occur on one side than on the 
other, sets in which items of one side may be used more than once 
each, and even compound sets in which double or even triple match- 
ings of all items are necessitated by the provision of three or even 
four related lists instead of the customary two. 

Pupil responses to matching exercises are usually in the form of 
identifying numbers or letters written in column form in parallel 
with the items in one of the two or more lists. The unbalanced set 
has the definite advantage of reducing the chances of guessing the 
correct answers to practically zero. 

Uses and limitations of matching exercises. Matching exercises are 
likely to be rather highly factual in nature, and to make use of the 
who, what, when and where types of relationships and of identifying 
or naming abilities. They are rather easy to construct, and are per- 
haps for that reason more widely used than their characteristics 
warrant. They are likely to include clues to the correct responses 
unless there is rigid adherence to uniform categories of items in a 
matching set, and this restriction, desirable though it is, limits at 
least one side of the test unit to numbers, words, or at least short 
phrases. This restriction in turn tends to limit use of the item form 
mainly to factual types of subject matter. 

The matching exercise is economical of space and of construction 
time. It is useful for matching terms and definitions, names and' 
events, events and dates, books and authors, causes and effects, 
generalizations апа applications, words and symbols, English and, 
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foreign words, and many other pairs of related items by use of verbal 
lists. It is also useful with numbered maps, charts, or pictorial 
representations for matching places and names, places and events, 
trends and dates, or objects and names in great variety. The match- 
ing exercise appears to be most useful with factual knowledges in a 
great variety of situations where it is desirable to test over a number 
of comparable relationships. 

Major types of matching exercises. The fundamental form of 
matching exercise has an equal number of items in both lists and in- 
volves the use of all of the items in the pairing. Unbalanced matching 
sets provide more items on one than on the other side and require 
that only as many of the items of the longer list be used as have 
proper pairings with the items of the shorter list. 


Excerpt from Metropolitan Literature Test 1° 


Directions. In the parentheses after each character in Column 2 put the 
number of the character from Column 1 that appears in the same story of 
poem. 


Cotumn 1 Cotumn 2 

1. Ichabod Crane 53. Marygold...... iri n ( ) зз 
2. Don Quixote 54. The Mayor of Hamelin ..( — )s 
3. Laurie : 

4. The Pied Piper 55. Jim Hawkins........... ( ) 55 
5. Midas — 55." Nello: а ло 1 270: 41 (V Diss 
e yaupe Harriet $7.9 Dulcined ut 329 2021 aC |) a7 
7. Patrasch 

8. Black Beauty 58. Brom Bones............ (o reg 
9. Launcelot 595 Elsine. e NU 0, ( ) s9 
39, Long Joha Silver 60. Elizabeth Апп.......... OE T. 


Diagrams, maps, charts, and pictures may be used in what are 
often called identification exercises by requesting the pupil to match 
identifying names of places, objects, or parts with their representa- 
tions in the accompanying figure or picture. 


5 CONSTRUCTING OBJECTIVE TEST ITEMS 


This section of the chapter considers the general principles to be 
followed in the construction of various objective item types. Such 
questions as adaptation of item types to various subject matter 


18 Richard D. Allen and others, Metropolitan Achievement Tests, Test 6, Lit- 
erature, Advanced Battery. Published by World Book Co., 1946. 
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and the construction and use of the test as a whole were considered 
earlier in the chapter. Because of the multiplicity of item types, it 
is impossible to discuss all of them in detail. Therefore, the sug- 
gestions are intended mainly for the basic or most common forms 
of items, although in many instances they are equally well adapted 
to modified types of the basic items. 

General suggestions that seem to be equally applicable to all 
objective item types are given in the following section. These should 
serve as the introductory portion of the lists of suggestions on later 
pages for the various common or basic forms of items. The student 
will find that frequent reference to the sample items in the preceding 
section will be helpful in the study of methods for constructing the 
various item types. It should be apparent that common sense and 
personal experience must furnish the basis for recommendations on 
many issues discussed. Objective evidence is not available concerning 
the relative merits of different approaches on many of the issues, 
and on other points only inconclusive evidence and conflicting 
opinions and practices are presented in the educational literature. 
Therefore, this section can be said to present the authors' views, 
based on objective evidence and opinions of others and on their own 
experience in test construction, on a variety of detailed points which 
must be considered if objective item types are to be well con- 
structed.'* 


General suggestions for constructing objective items 


A number of the suggestions given here apply equally well to all 
or most objective item types. Attention will be given in the sub- 
sequent pages to suggestions that apply to recall types and to specific 
item types of the recognition form. 

(x) Rules governing good language expression should be observed. 
This point deserves mention because carelessly framed and un- 
grammatical items are more likely to be subject to misinterpretation 
than are items that are carefully constructed and correctly stated. 

(2) Difficult words should be avoided. Care should be taken at all 
times to make certain that the words used in objective items are 
known to all pupils, for every pupil should be able to understand 
the intent of all items. This recommendation does not, of course, 


14 See also Robert L. Ebel, “Writing the Test Item." Educational Measurement. 
American Council on. Education, Washington, D. C., 1951. p. 213-44. 
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apply to the technical words of the subject being tested, for knowl- 
edge of technical vocabulary is an outcome of instruction that may 
well be tested. Every effort should be made to adapt general vo- 
cabulary words to the ability levels of the pupils being tested, how- 
ever. In case of doubt, it is always a safe procedure to choose the 
simpler of two words that might be used in stating a test item. 

(3) Textbook wording should be avoided. It is undesirable to 
obtain items merely by taking a statement from a textbook and 
using it in its exact textbook form or with a negative inserted, a word 
omitted, or other minor adaptation. In the first place, an occasional 
pupil has a memory for specifics of what he has read or heard which 
would enable him to answer the item from memory rather than in 
terms of knowledge and understanding. In the second place, a 
majority of textbook sentences, unless they are from summary para- 
graphs or are topic sentences of paragraphs, are too detailed to merit 
direct attention in a test. In the third place, there is danger that 
items so selected would be too much dependent upon a particular 
textbook or author and not be broadly representative of the field 
being tested. 

(4) Ambiguities should be avoided. Care should be taken to make 
certain that each test item is subject to one and only one interpreta- 
tion. It is not always easy to accomplish this purpose, for ambiguities 
sometimes remain after an item has been carefully framed and 
scrutinized. Items should be sufficiently definite that there is no 
chance for misinterpretation of meaning through reasonable implica- 
tions or logical inferences. Item-counting methods of evaluating items 
after they have been used once are helpful in eliminating ambiguities 
that have been overlooked in the initial framing of a test. 

(5) Items having obvious answers should not be used. Items to 
which answers are obvious have no value in a test and should 
definitely be avoided. 

(6) Clues and suggestions should be avoided. Items containing 
clues or suggestions also contribute nothing to a test and may well 
lack validity. 

(7) Items that can be answered by intelligence alone should not 
be included. Items that depend not at all on knowledge or under- 
standing but that can be answered by the exercise of intelligence 
alone have no place in an achievement test. 

(8) Quantitative rather than qualitative words should. be used. 
It is preferable to use words that have quantitative and definite 
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meaning rather than words that are qualitative in nature, as a means 
of eliminating items that depend on opinion rather than upon facts. 

(9) Catch words should not be employed. There is no justification 
for the inclusion in achievement test items of catch words, misleading 
statements, or irrelevant confusions. Pupils recognizing such points 
might interpret such features as typographical errors or as un- 
intentional for other reasons and answer them in terms of what they 
thought was intended. Furthermore, the best readers, who are fre- 
quently the best pupils, are perhaps least likely to note minor errors 
in a test because rapid reading entails less attention to specific letters 
and even words than does slow reading. 

(10) Items should not be interrelated. Items should not ordinarily 
be so related, at least if they are adjacent or close together in the 
test, that one depends on one or more other items in such manner that 
an answer to the first determines responses for the related item or 
items. In effect, such dependence places more than the intended 
amount of weight on the first item of any such sequence when answers 
consistent with the first are given by a pupil for subsequent items. 

(тт) Response positions should preferably be aligned. It is pref- 
erable, although not always possible, to have the response positions 
occur in a columnar arrangement. The pupil is aided by such a 
consistent position for responses and scoring of the results is greatly 
facilitated. 


Suggestions for constructing recall-type items 


Several suggestions applicable to recall item types alone are given 
and briefly discussed here. These suggestions represent for recall 
items a continuation of the list of general suggestions in the preceding 
pages of this chapter. As the simple recall and completion types are 
very similar except for two of the following points, the recommenda- 
tions for these item types are included in one list. 

(1) Lines for responses should be of the same and of adequate 
length. In recall item forms the length of all lines or blanks provided 
for pupil responses should be the same. The lines or blanks should 
be long enough to provide for normal writing of the longest word 
likely to be given as an answer. The constant length of line avoids 
giving any clue to the length of the correct answer that might be of 
use to the pupil in choosing between two answers he might be con- 


sidering. 
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(2) Desired responses should be definite. Each recall item should 
require a definite idea or concept as the correct answer in order to 
reduce the possibility of misunderstanding by the pupil and to insure 
objectivity of scoring. The response may be a word, a date, a number, 
a symbol, a formula, an answer to a problem, or even a short phrase. 

(3) Desired responses should be important. Only important and 
crucial aspects of a statement should be omitted in recall forms of 
items, for the omission of secondarily important or unimportant 
aspects of a statement reduces the significance of the item. 

(4) Amy correct answer should receive credit. Any answer that is 
correct, whether or not it is the one the teacher expected, should 
receive credit and the answer should be added to the scoring key for 
future use. 

(5) Spelling errors probably should not be penalized. Unless spell- 
ing errors occurring in pupil answers are in words technical to the 
subject for which the test is given, scoring should probably be in 
terms of the pupil's intent rather than in terms of his spelling ac- 
curacy. 

(6) “А” or “ап” should not immediately precede a blank. Either 
of the indefinite articles restricts the nature of the response word to 
follow in terms of grammatical correctness, so that the range of 
possible correct answers is mechanically narrowed for the pupil when 
“а” or “ап” immediately precedes a response position. Employment 
either of the definite article “the” or of “a(n),” which means either 
“a” or “an,” is permissible. 

(7) Positions for responses should ordinarily be at the ends of the 
sentences. It is preferable that blanks to be filled occur at the end 
rather than in the middle of sentences. Statements can usually be so 
worded that this is easily accomplished. 

(8) Completion paragraphs should be unified wholes. A com- 
pletion paragraph should be unified and well organized and should 
not consist of several unrelated or poorly related sentences. The 
pupil’s ability to grasp the entire thought unit should be essential 
to correct responses for the several blanks in the paragraph. 

(9) Completion paragraphs should not obscure the meaning by 
containing too many blanks. Sufficient of the paragraph should be 
given that the meaning is clear to an informed and intelligent reader. 
It is easy for the teacher constructing a paragraph, who knows 
definitely what the paragraph is about, to assume unconsciously that 

_the pupil should have the same knowledge and consequently leave 
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out so many words that to the pupil the meaning is obscure or not 
ascertainable. 


Suggestions for constructing alternate-response items 


The suggestions below for the alternate-response type of item 
supplement the general suggestions previously discussed. As the 
true-false is the most widely used of these types, most of the sug- 
gestions below relate primarily to it or a closely allied form. 

(1) Double negative statements should be avoided. Double nega- 
tives serve no useful purpose, but they may cause needless and 
harmful reading problems for some pupils. 

(2) Statements that are part true and part false should not be 
used. Statements should be either true or false, for the use of a true 
major clause and a false dependent clause or of some other combina- 
tion of truth and falsity is confusing to the pupil and adds nothing 
to the test. Although such part true, part false statements are used 
by some test workers, the result frequently is an unintentional 
“catch” item. 

(3) “Specific determiners” should be used sparingly and. carefully. 
Such specific determiners as “always” and “never” occur in false 
statements much more frequently than in true statements. State- 
ments containing cause or reason clauses also tend to be false more 
often than true. On the other hand, comparison statements and very 
long statements are more often true than false. 

(4) Answers should be required in a highly objective form. It is 
inadvisable to have pupils write a letter, such as T or F, or a word, 
such as True or False, in answering the items, for those letters and 
words look much alike when poorly written or when written with 
the attempt to confuse the scorer. Methods requiring pupils to en- 
circle or to underline T or F, Ves or No, or having pupils mark an 
«Y? in the brackets in either the T ог F column are to be preferred. 

(s) Approximately an equal number of true and false statements 
should be used. It is not desirable to have a great imbalance of true 
and false statements, but on the other hand there is no need for 
exactly the same number of each type of item. 

(6) Random occurrence of true and false statements should be 
employed. A coin may be tossed or some other simple chance pro- 
cedure be used to make certain that true and false statements occur 
in random or chance order. \ 
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Suggestions for constructing multiple-choice items 


The following suggestions, supplementing the general recommenda- 
tions given in an earlier section of this chapter, are primarily for the 
multiple-choice item type with only one correct answer or the closely 
related best-answer type. 

(т) As much of the statement as possible should occur in the in- 
troductory portion or stem. There is no justification for repetition 
of the same introductory word or words in each of the alternatives; 
the introductory, or common, portion of the item should include as 
much as possible as a means of saving space. 

(2) Alternative answers should all be stated in correct gram- 
matical style. Yt should be possible to follow the stem of an item 
with any one of the alternative answers and have the statement be 
grammatically correct. 

(3) Incorrect alternatives, or confusions, should be plausible. 
One or more alternatives that are obviously incorrect in effect give 
the pupil a greater chance of guessing the correct answer. Pupils’ 
wrong answers to recall items often provide excellent confusions for 
the same items; k put into multiple-choice form. 

(4) “A” or “an” should not ordinarily be used to introduce the 
alternative answers. Unless all answers can follow the same article 
with grammatical correctness, the *a(n)" device mentioned above or 
the indefinite article should be used to introduce the alternative 
answers. 

(5) Items should ordinarily have four or five alternative answers. 
Except for use with very young children, four or five alternative 
answers are preferable as a means of reducing the chances of guess- 
ing the correct answer and in order to obtain the desired degree of 
item difficulty, although two well-chosen confusions are preferable to 
three or four implausible wrong answers. 

(6) All items should ordinarily have the same number of alternate 
answers. Four- and five-response items should ordinarily not be 
mixed in the same test, for the same number of alternatives for each 
item is preferable for ease in correction for guessing. 

(7) Alternative answers should ordinarily occur at the end of the 
statement. Although the responses may be so placed that additional 
material common to all is necessary to complete the statement, re- 
wording will ordinarily make possible their placement at the con- 
clusion of the statement. 
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(8) Answers should be required in a highly objective form. It is 
perhaps preferable that a pupil write the identifying letter or number 
for the intended response or encircle or otherwise mark it in a special 
answer column. There is little efficiency in a method requiring under- 
lining or, worse yet, both underlining and otherwise indicating, an 
intended answer. 

(о) Correct responses should be distributed with approximate 
equality among possible answer positions. In four-response items, for 
example, the first, second, third, and fourth alternatives should be 
correct for approximately the same number of items. It may be 
desirable to favor the centrally-located responses slightly over first 
and last responses for the correct answers. 

(10) Random occurrence of correct responses should be employed. 
A die may be tossed (disregarding the six) or some other simple 
chance procedure be used to insure random order in the occurrence 
of the various correct answer positions. 


Suggestions for constructing matching exercises 


The suggestions given below for the common type of matching 
set supplement the general suggestions on pages 187 to 189 for all 
types of objective items. 

(1) Only one correct matching for each item should be possible. 
If items are not mutually exclusive, i.e., subject to only one correct 
matching, some pupils may be penalized because they happen to 
choose the one of two or more possible matchings for a certain item 
that results in the lack of a proper answer for an item at the end of 
the matching process, when the same number of items appears in 
each column. 

(2) Consistency of grammatical form should be used. All items 
in the left-hand set should agree in form and all items in the right- 
hand set should likewise be in agreement. It should be possible inso- 
far as the form of the statements is concerned to associate any item 
of the left with any item of the right column. If this is not true, 
answers can be obtained partly by attention of the pupil to gram- 
matical detail in the statements. 

(3) Consistency of classifications should be maintained, Each of 
the two lists should contain items that are of the same category. 
Although matching sets that are not consistent within each column 
are used by some test makers, the results from mixed categories are 
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sometimes confusing, often provide a means of answering items by 
the exercise of general intelligence alone, and in general are un- 
satisfactory. Consistent categories are much to be preferred. 

(4) Matching sets should neither be too long nor too short. From 
ten to fifteen pairings are probably optimum for balanced-matching 
groups. More than fifteen pairs become cumbersome and time- 
consuming. Fewer than ten pairings present opportunities for good 
guessing on the last few matchings by the pupil who knows most of 
the pairings. Unbalanced matching sets are definitely preferable and 
perhaps should be used in all matching sets. 

(5) Items should be listed in random order in each list. Such 
logical arrangements as alphabetical order of first letters of words 
and chronological order of dates usually accomplish this purpose, 
for such arrangements are not likely to have any similarity to the 
relationships between the items of the two lists and furnish no clues 
to the pupils. 

(6) A set of matching items should always be complete on one 
page. The necessity for frequent rereading of items makes very in- 
efficient any separation of a set of matching items by having it appear 
on two pages of the test. 

(7) Answers should be required in a highly objective form. 
Perhaps the most satisfactory method of providing for pupil re- 
sponses is to accompany one list with letters or numbers identifying 
each item and the other list by answer positions, and then to have 
pupils write the letters or numbers in the answer column in such 
manner as to indicate their choices. 


6 USING RESULTS OF INFORMAL OBJECTIVE TESTING 


Only brief mention is made here of the uses to which the informal 
objective examination can be put. The alertness and ingenuity of 
the teacher largely determine the values that result from his use of 
the informal objective test. 


Informal objective tests in instruction 


The evaluation of pupil and class achievement is most effectively 
accomplished through the use of the objective examination. Even if 
there were standardized tests for the measurement of most of the 
outcomes of class instruction, they would be unsuited for this type 
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of use. Properly constructed objective examinations within certain 
limits aid the teacher in determining points at which instructional 
adjustments should be made. Pupils, likewise, may be led to discover 
their specific weaknesses in achievement. Informal objective test 
results are thus shown to have general diagnostic value for relative 
pupil strengths and weaknesses. Such tests can also be used for 
instructional as well as for measurement purposes. Informal ob- 
jective drill and remedial devices can be constructed by the alert 
teacher. 


Informal objective tests in determining course marks 


Pupils! scores from valid and reliable objective examinations afford 
the teacher's best single basis for measuring and rating pupil achieve- 
ment within a given subject. The results of objective examinations 
enable the teacher to improve the reliability of his marks if the tests 
themselves are valid and reliable measures of the course outcomes. 
Teachers can learn with practice to construct course examinations 
that will satisfy the criteria of a good examination, and that will be 
more valid tests for the outcomes of his particular course than 
standardized tests could ever be. The remaining step for the use of 
test results in marking is to convert scores to the particular type of 
marks desired. Because of the importance of this use of informal 
objective test scores, a widely used method of converting them to 
course marks is explained in Chapter тз. This system can readily 
be adapted as required, if it is not applicable in its present form, to 
the marking system used in a particular school. 


Topics for Discussion 


r. Explain the differences between standardized tests and informal 
objective examinations. 

2. What reasons can you advance for the general conclusion that there 
is no conflict between standardized tests and informal objective 
tests? 

3. Discuss the major advantages of the informal objective test over 
the traditional or essay test. 

4. Discuss the limitations sometimes claimed for the informal ob- 
jective test. 

5. Distinguish among various types of instructional outcomes and 


illustrate each. 
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6. Briefly comment upon the selection of content and general con- 
struction of the teacher-made objective test. 

7. Discuss pro and con the advisability of using several types of ob- 
jective test items in the same classroom test. 

8. Why should an objective test be so difficult that no pupil can make 
a perfect score and yet so easy that no pupil will have a zero score? 

9. What cautions should be observed in administering the teacher- 
made objective test? 

то. How should the various types of objective test items ordinarily be 
scored? 

тт. What procedures are useful to the teacher in the revision of the 
informal objective examination? 

12. What are the major uses of the informal objective test? 

13. Clearly distinguish between recall and recognition item forms. 

14. Distinguish between the two ordinary forms of recall items and 
illustrate each type. 

15. Give examples of several alternate-response item types. 

16: Show the differences among the ordinary multiple-choice, the 
multiple-response, and the best-answer item forms. Ilustrate. 

17. Which type of matching exercise, the balanced or the unbalanced, 
is preferable? Why? 

18. Give some of the most important general suggestions for the con- 
struction of objective test items. 
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Constructing and Using 
Performance Tests 


THIS CHAPTER presents a brief treatment of the following points in 
the construction and use of performance tests: 


Growth of interest in performance testing of achievement. 
Measurable characteristics of performance. 

Object tests and their functions. 

Types of measures of performance. 

Methods of evaluating products. 

Constructing performance tests. 

‘Using performance test results. 


hb ӨР Ca Deae 


Teachers and other users of educational achievement tests have 
long been aware that results from paper-and-pencil tests of facts and 
information in an instructional field reveal only a part of the story 
of educational accomplishment. Admittedly such tests are easily given, 
are quickly scored, and are extremely useful in the classroom, but 
they are also limited to the degree that in many instructional areas 
results from conventional tests emphasizing facts and principles are 
not highly correlated with actual performance. The recognition of 
this fact points up one of the serious limitations in the validation of 
many otherwise excellent achievement measures, and makes quite 
obvious the need for performance tests to supplement other measures 
of achievement. 

In one form or another, performance tests are utilized in all three 
areas of measurement and evaluation treated in this volume—intelli- 
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gence, personality, and achievement. Such tests are used in measur- 
ing general intelligence of illiterate persons and individuals having 

„С language handicaps of other types. They are also used in the measure- 
ment of aptitudes for various types of manual skills. Conduct or per- 
formance tests are also extensively used in the evaluation of a wide 
variety of personality characteristics and traits. This chapter is pri- 
marily concerned with the measurement of those physical and motor 
reactions that represent important behavioral and skill outcomes of 
learning and that are, therefore, evidences of educational achieve- 
ment, 


] NATURE OF PERFORMANCE TESTS 


Development of performance testing 


Although pertormance tests were used by primitive peoples in their 
tests and ceremonies preceding the induction of youth into adult 
society and by the Greeks and Spartans in their athletic games, 
the objective written test came into use many years before ob- 
jective performance tests received much attention. Despite the 
fact that several of the earliest instruments for the objective meas- 
urement of educational achievement were quality and so-called 
product, or source, scales for handwriting, drawing, English compo- 
sition, and spelling, the paper-and-pencil test largely dominated in 
objective measurement until some twenty years ago. With an in- 
creasing realization on the part of psychologists and educators that 
knowledge of facts and principles is not necessarily accompanied by 
skills in their use and application, attention was directed, or perhaps 
redirected, to objective procedures for the measurement of skills in 
functional situations. 

Broadly speaking, every test is a performance test, whether the 
performance consists of oral responses to questions, written responses 
on an essay or an objective test, or the application of physical or 
motor skills in a certain test situation. However, paper-and-pencil 
tests of factual knowledge are not generally regarded as performance 
tests. Neither are oral and essay tests, for that matter, but to the 
degree that the pupil's skill in expression is evaluated they actually 
are performance tests even though manipulation as such is not in- 
volved. For the purposes of this chapter, however, performance tests 
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will be considered primarily as those requiring the use and often the 
manipulation of physical objects and the application of physical and) 
motor skills in situations not restricted to oral and written responses. 


Measurable characteristics of performance 


A distinction of importance is found among the measurable char- 
acteristics in the field of performance testing. These instructional 
outcomes appear to be largely in the areas of knowledges, concepts, 
understandings, skills, and applications. 

The knowledges, concepts, and understandings serving as necessary 
background for many types of skill performances are measured by 
what are variously called object tests, recognition tests, and identifi- 
cation tests. Recognition tests and identification tests do not neces- 
sarily imply the use of physical objeots in the test situation but 
instead may involve photographic or drawn representations of the 
articles. These testing techniques most appropriately are considered 
in the treatment of objective written tests. Object tests, on the other 
hand, imply the presentation and use in the test situation of three- 
dimensional articles. Accordingly, they are dealt with in this chapter. 

The skills and applications outcomes are measurable in some in- 
stances by written tests and in others by performance measures. Such 
skills and applications as those involved in reading comprehension, 
written expression, and arithmetic and mathematics are commonly 
evaluated by means of paper-and-pencil tests. So are some of the 
mathematical aspects of the sciences and even of the social studies. 
The aspects of performance testing for the direct measurement of 
skills and applications to be dealt with in this chapter are concerned 
with the procedures followed in performing a certain task and the 
product resulting from the completion of the task. Check lists and 
timing devices are the most widely used educational tools for evalu- 
ating the performance of the pupil, whereas quality scales, rating 
scales, score cards, and counting and measuring are commonly used 
in the evaluation of performance as it is evidenced in the completed 
product. 

Tests of performance may be classified in several useful ways. The 
one chosen here divides the instruments and techniques for perform- 
ance testing of educational achievement into: (1) object tests, (2) 
performance measures, and (3) product evaluations. 
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2 OBJECT TESTS 


Object tests measure knowledges, concepts, and understandings 
prerequisite to functional performances of certain skills but in them- 
selves are concerned with quite tangible and sometimes formal types 
of instructional outcomes. They are sometimes called identification 
tests or recognition tests because the pupil is asked to identify or 
recognize an object, specimen, or selection presented to him in his 
capacity as an observer or as a listener. The pupil may be asked to 
identify geological, biological, or other specimens presented to him 
in actuality or in the form of photographs or line drawings. He may 
be asked to recognize musical selections played by a soloist or orches- 
tra or reproduced by a phonograph. 

Although the visual and auditory senses are the ones primarily in- 
volved in such situations, the sense of touch may also enter in the 
case of objects where knowledge concerning grain, texture, or other 
surface qualities might aid in identification. The other two of the 
five basic senses—taste and smell—may even be employed in some 
less usual situations in the practical arts and physical sciences where 
physical objects are to be identified. The object test is more func- 
tional than the comparable type of test in which only photographic 
or drawn representations of the objects are presented, for the object 
may variously be seen, felt, listened to, and even smelled or tasted, 
whereas a pictorial representation can be interpreted only in terms 
of its visual stimulus. 

The accompanying illustration from the Prognostic Test of Me- 
chanical Abilities is us2d to represent an object test, although it actu- 
ally involves the photographic presentation of objects and written 
responses by the pupils. However, if the student visualizes a setting 
in which the eight tools, appropriately numbered, are actually laid 
out on a table or bench and if he changes the first questions of the 
illustration so that they relate to the actual objects presented, he will 
obtain a clear idea concerning the setting and nature of an object 
test. The first three questions of the illustration are of the identifica- 
tion or recognition type and measure factual knowledges, but the 
last three questions go beyond formal knowledges in measuring con- 
cepts and understandings concerning the nature and appropriate uses 
of the tools. 
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Excerpts from Prognostic Test of Mechanical Abilities * 


31-50. DIRECTIONS: Each of the following incomplete statements or questions is followed Ьу five 
possible answers. For each item, select the answer that best completes the statement or answers the 
question, and write its number or letter on the line to the right. 


Statements 31-45 refer to pictures of tools above 


31. A claw hammer is shown in picture 1 3 6 7 
32. A chisel is shown in picture Do rg CP 
33, A ball peen hammer is shown ih picture 1 3 5 6 8 — 


39. Tool No. 1 can be used to: з 61е metal polish metal *drill holes 
атаке dents out of metal есашк metal ——89 


40, ToolNo.2canbeusedto: з тас тега drive a screw ‘file metal 
@fasten a bolt *lock a nut 


41. Tool No. 3 can be used to. cut wood pull outa nail *bend a rod 
ddrive a nut tighten a nut 


Another illustration of an object test is drawn from the field of 
home economics. The accompanying illustration shows how pupils 
used actual shirts, ties, handkerchiefs, and socks displayed on a 
screen in demonstrating certain concepts and understandings con- 
cerning the significance of color in clothing selection. 


Sample of Test in Clothing * 


Assume that the articles displayed on Screen I are to be worn with a 
suit of dark-value gray, a top coat of middle-value gray; and middle-value 
pigskin gloves. Choose the most becoming shirt, tie, handkerchief, and 
socks for each man to wear with the gray suit, coat, and gloves. Write 
the number corresponding to your choice in the blank at the left of each 


item, and list no article more than once. 


1J. Wayne Wrightstone and Charles E. O'Toole, Prognostic Test of Mechanical 


Abilities, Form A. Published by California Test Bureau, 1946. 
? Clara B. Arny, Evaluation in Home Economics. Appleton-Century-Crofts, Inc., 
` 


New York, 1953. p. 143. 
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Articles of Clothing Descriptions of Men 
—— Tra shirt A. black hair, fair skin, and blue eyes 
IB. tie Р 
2 $8. handkerchief 
Hin. ‘socks 
Dc NEC Shirt B. medium brown hair, blue eyes, and some- 
E06. e what sallow skin 
— —. 7. handkerchief 
LER. ;SpCKS 
Ges, "oesshirt C. auburn (red) hair, brown eyes, and florid 
AL то te complexion 
— —. II. handkerchief 
12. socks 


Q PERFORMANCE MEASURES 


The process by which a pupil produces some type of desired result 
ina test situation is appropriately observed and evaluated as to quality 
by use of check lists and as to quantity or time by use of a timing de- 
vice. Performance measurement is often highly diagnostic in its sig- 
nificance, but it is time-consuming in those instances in which each 
pupil must be tested individually. 


Check lists 


The processes involved in the performance of a complex skill are 
subject to measurement and evaluation by the use of observational 
techniques. Although observation unaided by any objective instru- 
ment may often be effective when the observer is a qualified specialist 
in the skill in question, the use of check lists insures a more accurate 
and comprehensive record of the actual behavior of the individual 
observed. Such check lists have been evolved for a number of skill 
performances in industrial arts, home economics, and laboratory 
science. 

Tyler illustrated procedures measurement in testing ability to use 
the microscope. The technique is necessarily used with only one per- 
son at a time because it requires full-time observation of the pupils 
by the examiner. The check list illustrated herewith includes a se- 
quential listing of the appropriate and inappropriate steps of pro- 
cedure in adjusting the instrument and finding a yeast cell or a blood 
cell on a slide, using a culture previously prepared by the examiner. 
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As the student prepares the slide, endeavors to adjust the micro- 
scope, and attempts to locate a cell, the examiner records his opera- 
tions by numbers in sequence at appropriate places on the check 
list. The result is a diagnostic version of the student's success or fail- 
ure in the assigned task. The numbers in the illustration show the 
results of such a record for an actual performance by a student. 


STUDENT'S ACTIONS 


a. Takes dide 
b. Wipes slide with lens paper 
c. Wipes slide with cloth 
d. Wipes slide with finger 
c. Moves bottle of culture along the table 
£. Places drop or two of culture on dide 
g. Adds more culture 
h, Adds few drops of water 
i, Hunts for cover’ glasses 
ў. Wipes cover, glass with lens paper 
k. Wipes cover with cloth 
1. Wipes covet with finger 
т. Adjusts cover with ‘finger 
n. Wipes off surplus fluid 
o. Places slide on stage 
р. Looks through eyepiece with right eye 
q. Looks through eyepiece with left eye 
r. Turns to objective of lowest power 
з. Turns to low-power objective 
t. Turns to high-power objective 
ч, Holds one eye closed 
v. Looks for light 
w. Adjusts concave mirror 
x. Adjusts plane mirror 
у. Adjusts diaphragm 
2. Does not touch diaphragm 
aa, With eye at eyepiece turns down coarse 
adjustment 
ab, Breaks cover glass 
ac, Breaks slide 
ad. With eye away from eyepiece turns down 
coarse adjustment n = 
ae. Turns up coarse adjustment à great dis- 
tance 
af, With eye at eyepiece turns down fine ade 
justment a great distance 


SKILLS IN WHICH STUDENT NEEDS 
FURTHER TRAINING 


a, In cleaning objective 

b. In cleaning eyepiece 

c. In focusing low power 

d. In focusing high power 

e. In adjusting mirror 

f. In using diaphragm 

E In keeping both eyes open 
In protecting slide and objective from 
breaking by careless focusing 


esi 


STUDENT'S ACTIONS (Centinocd) of 
Actio 


ag. With eye away from eyepiece turns down 
fine adjustment a great distance. 
ah. Turns up fine adjustment screw a great 
distance. 
ai, Turns fine adjustment. screw a few turns 
aj. Removes slide from stage 
ak. Wipes objective with lens paper 
al. Wipes objective with cloth 
am. Wipes objective with finger 
эп, Wipes eyepiece with lens papér 
ao. Wipes eyepiece with cloth 
ap. Wipes eyepiece with finger 
aq. Makes another mount — * 
ar. Takes another microscope 
as, Finds object * 
at. Pauses for an interval 
au. Asks, “What do you want me to do?” 
av, Asks whether, to use high power 
aw, Says, “Рт satisfied” 
ах, Says that the mount is alll right for his eye 
ay. Says he cannot do it 
az. Told to start new mount 
аза. Directed to find object under low power 
aab, Difected to find object under high power 
NOTICEABLE CHARACTERISTICS OF 
STUDENT'S BEHAVIOR 
a. Awkward in movements 
b. Obviously dexterous in movements 
c. Slow and deliberate 
d. Very rapid 
e. Fingers tremble 
f. Obviously perturbed 
g. Obviously angry Ж 
h. Does not take work seriously 
i. Unable to work without specific directions > 
j. Obviously satisfied with his unsuccessful efforts . 


CHARACTERIZATION OF THE 
STUDENT'S MOUNT 


a. Poor light 
b. Poor focus 
<. Excellent mount 
d. Good mount 
e. Fair mount 
f. Poor mount 
„Уе! r mount 
Ё Nothing in view but a thread in hie Суерісее 
i, Something on objective io 
ў. Smeared lens Ш 
к. Unable to find object 


Fig. 11. Check list of student reactions in finding an object 
under a microscope * 


3 Ralph W. Tyler, 


Bulletin, 9:493-96; November 19, 1930. 


“A Test of Skill in Using a Microscope." Educational Research 
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Supplementary portions of the check list provide for checking 
characteristics of student behavior, characteristics of the mount, and 
skills in which the student needs further training. When the student's 
performance is summarized and diagnosed by the use of these sections 
of the check list, the special type of remediation needed is ordinarily 
disclosed and serves as the basis for providing the necessary type of 
remedial instruction. The diagnostic significance of this type of sum- 
mary is shown by the check marks in the illustration representing 
characteristics of and deficiencies in the student's performance. 

As this individual technique is time-consuming, students can first 
be tested in a group situation where success or failure in a task com- 
mon to all can be readily checked for each student by the quality of 
the adjustment he obtains. It then becomes necessary to use the check 
list only with those individuals who do not succeed in the group 
test situation. 

This illustration is representative of the work-sample tests em- 
ployed both in procedures and product measurement. The use of a 
microscope in finding a yeast cell or a blood cell may be considered 
as a sample, or work-sample, of the various ways in which a micro- 
scope is used in biological science. 


Timing devices 


A stop watch or even an ordinary watch having a second hand is 
the only timing device the teacher ordinarily needs in performance 
testing, although some specialized types of performance may require 
the use of more precise timing instruments. In performances where 
speed is an important characteristic, the time required for the per- 
formance of the assigned task may constitute one measure, although 
in some instances there may be other and even more important 
measures. In the measurement of speed and accuracy in clerical, 
typing, stenographic, and other types of performances where speed 
is an important factor, a watch becomes a tool for the measurement 
of educational achievement. 

It was pointed out in Chapter 3 that speed may be measured in 
terms of the amount or quantity of production in a given period of 
time or in terms of the time required to complete a product of a 
certain quality or to perform a task of a certain level of difficulty. 
This second method is the one ordinarily applied in performance 
testing when the job is to produce a completed product, such as a 
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planed and squared up board in manual arts or a seam in home 
economics. In a skill such as typing, however, the time is usually held 
constant and the quantity, as well as the quality, of the production 
is measured. 


4 PRODUCT EVALUATION 


The quality of a skill performance can be evaluated in terms of 
the characteristics of the completed product as well as by the 
characteristics of the techniques used in its production. In assessing 
the characteristics of the product, quality scales, rating scales, score 
cards, and counting and measuring techniques are usually employed. 
There are occasions when it may be desirable to evaluate both the 
procedure and the product, but in many instances a good product 
depends so much upon effective procedures that product measure- 
ment alone may suffice. When such is the case, observation of each 
pupil separately during the performance ceases to be necessary. The 
resulting products can later be evaluated individually by the teacher. 


Quality scales 


Although standardized quality scales have been devised and used 
at least to some extent for measuring a variety of skills in composi- 
tion, fine arts, industrial arts, home economics, and handwriting, it 
is probably in the last-mentioned area that they have been used 
most. The handwriting scale of the California Achievement Test 
is shown in Figure r2 to illustrate the quality scale. The pupils 
taking this handwriting test write the words used as samples on the 
scale. Each pupil's handwriting quality is then evaluated by finding 
the scale sample most closely resembling his writing and assigning 
the appropriate grade or age equivalent. 


Rating scales and score cards 


Rating scales and score cards are very similar devices for use in 
measuring the quality of the product made in a test situation. A 
numerical scale typically provides for separate ratings or evalua- 
tions on the various elements of skill required in the total per- 
formance. Such distinctive features of the total performance may 
range from only a few to a large number, depending on the com- 
plexity of the skill performance and the degree of analysis desired 
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in the evaluation. Numerical values representing various levels of 
quality usually range from three or four to ten. It is doubtful if more 
than ten degrees of quality can be distinguished reliably in evaluat- 
ing qualitative performances. 


lin months) 


E 


1154- 201 


Fig. 12. Handwriting scale of the California Achievement Tests * 


4 Ernest W. Tiegs and Willis W. Clark, California Language Test Manual, Inter- 
mediate. California Test Bureau, Los Angeles, 1950. p. 15. 
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Two samples are presented here to illustrate this method of pro- 
duct measurement. The first provides for a rather highly analytic 


Excerpt from Rating Form for Fastening ° 


(a) Nails: 
(т) Straightness ...... НАСБ Өл ^9. 9. X0 
Are nails driven straight, heads 
square with wood, no evidence of 
bending? 
(2) Hammer marks ... 12 3 4 5 6 7 8 9 ro 


Is wood free of hammer marks 
around nails? 

(3) ориеи T 2452 05146. 7 '8..0 MIO 
Is wood free of splits radiating 
from nail holes? 

(4), Depth ей Lu ОЗ M оит 89 9 TO 
Are depths of nails uniform and of 
pleasing appearance? 


(5) #Spacing у... 15i оа 9 30 ЖУ 8-9. то 
3 Are nails spaced too close or too 

far apart? 
АВН ВЕЗЕ У LE 95460 7.9 9 то 


Will the nails hold? 


rating of a fairly simple skill. The second shows how a more complex 
skill may be evaluated in a less highly analytic manner. 


Food Score Card for Waffles * 


Taste and Flavor... 6. Too sweet or flat or taste of Pleasing flavor 
leavening agent or fat 
————— 


SCORE 


5 Dorothy C. Adkins, Construction and Analysis of Achievement Tests. U. S. 


Government Printing Office, Washington, D. C., 1947. р. 231. А : 
$ Clara M. Brown, Food Score Cards: Waffles, No. 53. Published by University of 


Minnesota Press, 1940. 
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Counting and measuring techniaues 


Counting most often becomes a direct product measurement tech- 
nique when the quantity of articles produced in a given time or the 
errors made in a certain piece of work are important to the total 
evaluation. Speed of production is usually not greatly stressed in the 
school classroom, although where boys and girls are receiving train- 
ing for certain types of employment, as they often are in trade and 
industrial schools, speed of performance may assume considerable 
significance. The error count in typewriting is a direct measure of 
quality in the product, and it is often combined as a penalty with 
the number of words typed per minute to obtain an evaluation of the 
total product. This procedure in effect provides a combined quantita- 
tive and qualitative score. 

Measuring instruments of various types are also used in evaluating 
the quality of a product. Such devices as rules, calipers, squares, 
scales, gauges, and other instruments may be used in determining 
how accurately the pupil has performed the assigned task. Special 
mechanical testing devices may even be devised by the teacher of a 
skills subject to serve certain specific purposes. 

Newkirk and Greene illustrated a performance test in which the 
product is measured objectively to determine the quality of pupil 
workmanship on a test of accuracy in woodworking.” After each 
pupil has been assigned to a work bench, the examiner reads the 
following instructions aloud. 


Directions to Pupil. 'This is a test to determine how accurately you 
can use woodworking tools. The wood and all necessary tools will be 
given to you. The surfaces of the block of wood are numbered 1, 2, 3, 
4, 5, 6. You will be given specific directions for doing the job and a work- 
ing drawing that gives all the necessary dimensions. Do this project as 
accurately as you can. Do not waste time, but do not work too fast to do 
your best work. The steps must be done in the order given. After you 
begin work do not ask unnecessary questions, but if you are in doubt 
about a step in the procedure or a dimension on the working drawing 
ask the examiner. Write your name and grade in school on surface No. 6 
of the test block. Do not begin work until the examiner gives the signal. 


7 Louis V. Newkirk and Harry A. Greene, Tests and Measurements in Industrial 
Education. John Wiley and Sons, Inc., New York, 1935. p. 147. 
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Procedure: 


т. Select face No. т. Plane it square and true and to the thickness in- 
dicated on the working drawing. When finished re-mark No. т. 

2. Select side No. 2. Plane it square and true to surface No. т. When 
finished re-mark No. 2. 

3. Select end No. 3. Plane it square and true to No. 1 and 2. When 
finished re-mark No. 3. 

4. Measure from end No. 3 toward end No. 5, and square a sharp 
pencil line across the block to the length indicated in the working draw- 
ing. Saw off the waste material with a back saw so that the stock will 
be as nearly the required length as you can make it. Do not plane. Re- 
mark end No. 5. 

s. From edge No. 2 gauge a line the length of the block, allowing 
the exact width as indicated on the working drawing. Rip as nearly the 
exact width as possible. Do not plane. 

6. On surface No. 1 lay out the center for the hole and bore. 

7. When you have finished take your block to the examiner. 


The examiner then supplies a copy of the working drawing, repro- 
duced in Figure 13, to each pupil and makes sure that he has the 


E --5 


Fig. 13. Working drawing of a wood block * 


necessary tools and a piece of standard wood stock prepared in 
advance by the instructor. The directions to the examiner, given last 
here for purposes of simplicity of presentation but obviously familiar 
to the examiner in advance, serve to illustrate the remaining steps 
in test administration.’ 

Directions to Examiner: Tt is essential that the pupil shall understand 


the exact procedure, and that he be able to visualize how the block is to 
look when finished. The following directions are recommended: 


8 Ibid. p. 146. 
э Ibid. р. 146-47. 
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r. Read aloud and distinctly the directions to the pupil while the 
class follows silently. Answer any questions about the directions at this 
point. 

2. Show the pupils a completed test block, and if they care to, let 
them examine it. 

3. When there are no further questions, say, “Get ready. Hold up 
the test block. Begin work." 

4. During the examination answer any questions about the steps in 
the procedure by rereading the step in question with the pupil. 

5. Observe the pupils as they work to make certain that they are 
doing all the steps in the correct order. 

6. Make certain that the proper tool is used where indicated, but 
do not tell the pupil how to use the tool. 

7. Help any pupil having difficulty in interpreting the working draw- 
ing, but do not make any measurements on the test block for the pupil. 
This test measures ability to measure to И в in. with a ruler, but is not a 
measure of ability to read drawings. 

8. Take in the test block when the pupil has finished. The time is not 
important, for this is a test of quality or accuracy as it applies to modify- 
ing wood with simple hand tools. 

The authors specified the use of a try square, a 14-inch dowel 3 
inches long, and a scale graduated in sixty-fourths of an inch in 
evaluating the pupil products. They suggested that for each rated 
dimension ro points be assigned for an exact measurement and т 
point be deducted for each %¢ inch of deviation from the specified 
dimension.*° This final step illustrates the application of measuring 
instruments to the evaluation of the final product. 

Arny illustrated a functional situation in which each home eco- 
nomics pupil is given a miniature dress pattern such as that shown 
in Figure 14, a piece of paper to represent cloth from which a dress 
might be made, and the other necessary tools and materials used in 
the actual process of cutting dress material from a pattern and 
preparing it for Sewing. To simulate conditions in which a dress 
would actually be made, it was recommended that the paper used in 
lieu of dress fabric have a design on one side, to represent the right 
side of dress material, that its length be three times its width, to 
represent three yards of cloth thirty-six inches wide, and, of course, 
that it be appropriate in size to the miniature pattern. Each girl 
would be expected to 1: 

10 Ibid. p. 147-48. 


11 Clara B. Arny, Evaluation in Home Economics. Appleton-Century-Crofts, Inc., 
New York, 1953. p. 84. 
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cut out the pieces of the pattern and pin them on the paper strip, and 
then to draw the outline of the other half of each piece of the pattern 
in its proper location. 


The resulting products could be evaluated by the teacher, using a 
rating scale or score card prepared specifically for this type of skill 
performance. This illustration of product measurement is of the 
simulated conditions or miniature type. When it is not practicable 
to employ the real situation in which a certain functional skill is 
employed, it is sometimes possible, as here, to simulate the conditions 
by the use of a miniature test representing the real situation quite 


accurately. 
: E * 
° 


Fig. 14. Diagram of pieces of a dress pattern '* 


5 CONSTRUCTING PERFORMANCE TESTS 


While performance and other types of manipulative tests have 
been widely used in certain educational fields, such as the industrial 
arts and home economics, the practical reliability of many of these 
devices has not been very satisfactory. It is believed that a part of 
this difficulty arises from the fact that too many of the better-known 
paper-and-pencil testing techniques have been uncritically borrowed 
and used without the necessary technical and administrative modifi- 
cations required for effective testing in the specialized field. It is not 


12 Ibid. p. 85. 
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possible in this brief chapter to present examples of performance 
tests in many different areas of educational achievement, but a 
brief summary of certain general procedures that appear to be neces- 
sary in the construction of performance tests may be helpful. 

The following summary of steps in preparing performance tests 
is taken with only minor modifications from a statement by Newkirk 
and Greene.? While these suggested steps were originally designed 
for use in classes in industrial education, there is little reason why 
they may not be expected to function equally well in other fields 
in which performance testing is desirable: 


т. Make a job analysis of the activities covered in the course of 
study to determine exactly what qualities may be tested. 

2. Select the performance tasks that best represent the job. 

3. Decide what tools, materials, and equipment are necessary for the 
testing situation. 

4. Decide what elements in performance are to be evaluated: (a) 
performance in process or (b) product of performance. 

5. Prepare a number of test exercises or make a composite exercise 
that will offer the pupil an opportunity to provide an adequate sample 
of his work with each tool or instrument and type of material it is desired 
to test. 

6. Make a statement of procedure that tells the pupil exactly what 
to do in a vocabulary that is comprehensible at his grade level. 

7. Prepare a set of general directions for the pupil before the test is 
administered. 

8. Prepare directions for the examiner. 

9. Devise methods of scoring the test or evaluating the product that 
will provide an adequate measure of the results of each tool or instru- 
ment. 

то. Try out the test on a few students, and make the more obvious 
changes and corrections in its content or directions. 

11. Make two or more approximately equal forms of the test. 

12. Try out the test, and compute the reliability coefficient, standard 
deviation, and standard error of a score, along with the grade and per- 
centile or other types of norms. 


A simple illustration of the manner in which certain of these steps 
are used in the construction of a performance test is given on pages 
210 to 212 of this chapter. 


13 Newkirk and Greene, of. cit. p. 145. 
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6 USING RESULTS OF PERFORMANCE TESTING 


The problems of using and interpreting performance test results 
are so nearly identical with those involved in the use of results from 
other types of tests that they need no special treatment here. In 
fact, performance test results are important to the teacher primarily 
because they supplement other important measures of ability or 
accomplishment. 


Topics for Discussion 


i. List and discuss specific ways in which performance tests of accom- 
plishment supplement information provided by other objective 
achievement tests. 

2. Why should performance test results be particularly valuable in the 
testing of intelligence and personality qualities? 

3. Discuss the truth of the statement that “every test is a performance 
test." 

4. What are the chief limitations of performance testing procedures? 

Discuss the three major types of performance testing techniques. 

Show how quality scales, such as those for handwriting, drawing, 

lettering, sewing, soldering, splicing, and other industrial arts areas, 

are essential to the effective measurement of performance. 

7. Discuss in some detail the adequacy with which the twelve steps in 
constructing a performance test presented on page 214 are applied 
in the case of the example given in this chapter. 

8. Setout the specifications for the construction, use, and interpretation 
of a performance test in some other field than that illustrated in this 


chapter. 
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‘Tus CHAPTER presents a treatment of the following aspects of evalua- 
tive tools and techniques: 


The nature and characteristics of evaluation. 
Tests of abilities to interpret. 

Tests of practices and activities. 

Pupil profile, progress, and class analysis charts. 
Cumulative records and report cards. 

The interview and the questionnaire. 

Evaluation in the classroom. 


BS pU е 


e 


Evaluation is usually thought of as a broadly inclusive term. 
Accordingly, all types of tests, non-test tools, and techniques used 
in pupil appraisal may be considered evaluative. However, many of 
these instruments and techniques are treated elsewhere in this 
volume—achievement tests in Chapters 5 to 8, intelligence and 
aptitude tests in Chapter 10, and personality measures in Chapter 
тт. The line of demarcation between evaluative instruments and 
achievement tests on the one hand and evaluative techniques and 
personality measures on the other hand is not definite. The attempt 
is made in this chapter, therefore, to deal only with those tests, tools, 
and techniques not treated elsewhere in this volume but deemed 
most appropriate for use in evaluating and appraising pupil achieve- 


ment. 
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] MEANING OF EVALUATION 


Evaluation is a concept still relatively new to education. The term 
has been used to include appraisal of the school program, curriculum, 
and instructional materials, appraisal of the teacher, and appraisal 
of the school child. Its methods run the gamut from observation and 
testing to elaborate research techniques. Evaluation as dealt with 
here is concerned in the direct sense only with characterizations of the 
School child through testing, measuring, and appraising. 


Nature of evaluation 


Wrightstone's definition exemplifies the point of view appropriate 
when the school child is the focus of evaluation. 


Evaluation is a relatively new technical term, introduced to designate 
a more comprehensive concept of measurement than is implied in con- 
ventional tests and examinations...the emphasis in measurement is 
upon single aspects of subject-matter achievement or specific skills and 
abilities, but...the emphasis in evaluation is upon broad personality 
changes and major objectives of an educational program. These include 
not only subject-matter achievement but also attitudes, interests, ideals, 
ways of thinking, work habits, and personal and social adaptability.' 


This definition supports the distinctions made in Chapter 1 among 
testing, measuring, and evaluating. The terms represent successively 
more inclusive and meaningful approaches to the appraisal of pupils. 
Evaluation thus includes not only the methods and tools appropriate 
in testing and measuring but also a variety of procedures and instru- 
ments of broader scope. 


Characteristics of evaluation 


A somewhat more adequate understanding of evaluation can be 
obtained by considering its characteristics. Wrightstone characterized 
evaluation by stating its purposes and methods. 


First, it attempts to measure a comprehensive range of objectives of 
the modern school curriculum rather than limited subject-matter achieve- 


1J, Wayne Wrightstone, “Evaluation.” Encyclopedia of Educational Research, 
Revised edition. Macmillan Co., New York, 1950. p. 403. 
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ment only. Second, modern evaluation uses a variety of techniques of 
appraisal such as achievement, attitude, personality and character 
tests... rating scales, questionnaires, judgment scales of products, inter- 
views, controlled-observation techniques, sociometric techniques and 
anecdotal records. Third, evaluation includes integrating and interpret- 
ing the various indexes of behavior changes so as to construct an in- 
clusive portrait of an individual.... For this purpose a comprehensive 
cumulative record is valuable.* 


A somewhat more extensive characterization is that presented by 
Quillen and Hanna, who stated that: 


т. Evaluation includes all the means of collecting evidence on student 


behavior. 

2. Evaluation is more concerned with the growth which the student 
has made than with his status in the group... . 

з. Evaluation is continuous ...an integral part of all teaching and 
learning. 

4. Evaluation is descriptive as well as quantitative. 

Evaluation is concerned with the total personality of the student 

and with gathering evidence on all aspects of personality develop- 


ment. 
6. Evaluation is a cooperative process involving students, teachers, and 


parents.* 

These characterizations of evaluation justify the concern here with 
evaluation as distinguished from testing and measuring. For that 
reason the treatment of this chapter should be considered particularly 
V relation to Chapters 5 to 8. 

, 


сл 


2 EVALUATIVE TESTS 


The distinction between a test that measures and a test that 
evaluates is by no means exact. It is too much to demand that a test 
meet all of the characteristics outlined in the above section of this 
chapter, for testing techniques have not yet provided, and in fact 
may never provide, single instruments so broadly conceived. The 
tests considered here to be evaluative may in general be distinguished 
from tests that measure by their greater emphasis on the less tangible 


2 J. Wayne Wrightstone, “Trends in Evaluation." Educational Leadership, 8:91- 


95; November 1950. 
3 James Quillen and Lavone A. Hanna, Education for Social Competence. Copy- 


right by Scott, Foresman and Co., Chicago, 1948. p. 343-46. Reprinted by permission. 
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or even intangible types of instructional outcomes, on the types of 
outcomes resulting from broad and varied learning experiences, and 
on the ability of the pupil to apply and to use information in reason- 
ing and problem-solving. Such tests are here considered as oí two 
types: (1) interpretive tests and (2) tests of practices and activities. 
The tests illustrated and discussed below are only in small degree 
representative. The wide variety of techniques used and the length 
of adequate illustrations preclude a wider sample. 

Most of the tests of these types available in published form are 
for use in the high school or college. This is perhaps because to some 
extent they embody the philosophy of general education, so far con- 
sidered most applicable to levels above the intermediate grades. A 
few such instruments are available for use in the junior high school 
and intermediate grades, however. Moreover, many of these tests are 
not provided with norms, since norms for tests of quite intangible 
outcomes appear to lack precise meaning. 


Interpretive tests 


Tests measuring abilities to interpret are similar to reading com- 
prehension tests in that both include not only the test items but also 
the material to which the test items refer. This material is usually 
in verbal form for reading tests and often consists of a paragraph or 
short selection on which the test items are based. It may but often 
does not appear in verbal form for interpretive tests, as tabular and 
graphical materials are frequently the basis for the interpretations 
the pupils are asked to make. Furthermore, reading tests typically 
measure ability to answer questions of fact or to distinguish major 
ideas in the selection, whereas interpretive tests measure such com- 
plex abilities aS are involved in the interpretation of data, applica- 
tion of scientific principles, logical reasoning, and critical thinking. 

The accompanying excerpt from the lower level of the /nterfre- 
tation of Data Test, designed for use in the junior and the senior high 
school, represents a rather complex test unit based on a chart. Pupils 
are instructed to respond by answering “Т” if enough information is 
given to make the statement true, “Е” if enough information is 
given to make the statement false, ог ^U," meaning uncertain, if 
insufficient information is given to warrant a decision. Items in the 
upper level of this test, designed for use in the senior high school and 
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college, discriminate more finely than does the lower-level test by 
providing also for probably true and probably false responses on a 
five-point answer scale. 


Excerpt from Interpretation of Data Test * 


PROBLEM IV: THE JOB$ YOUTHS WANTED AND THE JOBS THEY СОТ (1934-1949). 
What They Wanted What They Got 


Professional and Technical 


OOOOOOOOOOOOO|sa 


Managerial оо |а 
онаа зды пппппп|ишишишшнп 
Skilled пппппгп |в 
Semiskilled DUD|BEEESSNRE 


Unskilled 0 Ei ER [| Ж] 
Domestic oogun 
Relief Project а 


Each block represents about 
three per cerit of the youths. 


Statements. 
31. The vocations requiring semiskilled labor employed more youths than any other single type of 


work. 

32. About one tenth of the youths were employed in jobs of a professional-technical and skilled 
character. 

33. Youths who worked on relief projects were not fulfilling their ambitions. 


34. The professional and technical fields were too overcrowded to employ all the youths who wished 
to enter these fields. 


Two excerpts from the Watson-Glaser Test of Critical Thinking 
are shown herewith. The first is designed to meastre discrimination 
of arguments as strong or weak and the second, having items in 
multiple-choice form, measures applied logical reasoning. This test 
also includes parts on generalizations, inferences, recognition of as- 
sumptions, and on certain types of attitudes and opinions. 


4 Right-Year Study of the Progressive Education Association, Interpretation of 
Data Test, Lower Level. Published by Educational Testing Service, 1950. 
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Excerpts from Watson-Glaser Tests of Critical Thinking ° 


VI. Can rich and poor obtain, on the whole, equal justice from the courts in the United States today? 
21. No; for all governmental agencies in a capitalistic society are fundamentally — 4; 42 
designed to protect the privileges of the owning сЇазз............................ Strong Weak 


43 m 
22. No; there are many dramatic cases illustrating prejudice against the poor... Strong Weak 
23. Yes; judges take an oath to support the law and the Constitution without fear 45 46 
ON runs yas ое д Кыыл. Strong Weak 
24. Yes; when a poor man sues a rich man or a large corporation, the jury's sym- 
pathies are more likely to be with the poor man, thus balancing any other advantage ү; 48 
which: the rich man imay Камёб осон afa eroe Strong Weak 


17. John asserted that all races are alike in abilities; only 
differences in opportunity make some better educated, more 
artistic, more honest, or more successful than others. Jim 
answered, " John, you're crazy! The idea that Negroes, Indi- 
ans, and Japanese all have the same ability and talent and 
character that white people do is ridiculous. Anyone with the 
least bit of common sense should not make such a foolish state- 
ment." 


We may properly conclude that — 

81 Jim believed that there are differences between the 
abilities of white people and the abilities of Negroes, 
Indians, and Japanese. 

82 John failed to take account of the history of the 
accomplishments of the various races. 

83 Jim understood the facts better than John did. 

84 John understood the facts better than Jim did. 

85 None of the above conclusions properly follows from 
the information given. 


The last illustration of this type to be given here is from the Logical 
Reasoning Test, one of the evaluation instruments of the Eight-Year 
Study of the Progressive Education Association, designed for use in 
Grades 1o to 12. The directions to the pupils appearing in the two 
boxes of the accompanying excerpt indicate that the pupil is asked 
not only to choose the appropriate conclusion from the three pro- 
vided but also to evaluate statements—four of the actual twelve 
appear in the excerpt—on their significance for the conclusion chosen. 


5 Goodwin Watson and Edward M. Glaser, Watson-Glaser Tests of Critical 
Thinking: (т) Test 3, Discrimination of Arguments, and (2) Test 8, Applied Logical 
Reasoning. Published by World Book Co., 1942. 
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Excerpt from Logical Reasoning Test 


Problem IV. 
In 1940, W. Gibson Carey, Jr., then president of the Chamber of Commerce of the United States, made the 
following statements in an address delivered at Waukegan, Illinois: 
“Our primary job is to kill excess government spending lest it kill us. Excess spending is in- 
evitable when the central government takes over many of the duties which rightly belong to the 
states. The central government of the United States has taken over many of the duties which 
rightly belong to the states,” 


Directions: Examine the conclusions given below. If the statements made by Mr. Carey are true, which 


one of the conclusions do you think is justified? 


Conclusions 


X. If the central government had not taken over many duties which rightly belong to the states, excess 
spending would have been avoided. 


Y. Since the central government took over many duties which rightly belong to the states, it was not 
possible to avoid excess spending. 


Z. Further information is needed before any logical conclusion can be drawn. 


А: Statements which explain why your conclusion is logical. 


Mark in columa B: Statements which do not explain why your conclusion is logical. 
C: Statements about which you are unable to decide. 


Statements 


1. Before any logical conclusion can be reached, one must know whether the state governments would 
be more efficient than the central, or federal, government. 


If one removed the fundamental cause for excess spending, excess spending would be avoided. 


3. Since we accept the statements made by Mr. Carey as true, excess spending could not be avoided 
if the central government took over many duties which rightly belong to the states. 


4. If centralization of government always leads to excess spending, then when centralization of govem- 
ment occurs, excess spending will also occur, 


Tests of practices and activities 


Paper-and-pencil tests of practices and activities must of neces- 
sity consist of verbalized statements or questions to which the pupils 
react instead of direct measurement. However, pupils’ responses to 
tests of this type may well disclose information that can lead to 
inferences concerning their individual interests, personalities, and 
adjustment. In fact, such tests are similar in some respects to the 
adjustment inventories treated in Chapter 11. 

A few items of the Health Activities Inventory are shown in an 
accompanying excerpt together with the instructions for Parts I 
and II. Part I is designed to obtain information concerning a stu- 
dent’s participation in desirable and undesirable health activities, 
whereas Part II measures his estimate of the validity of certain 
health practices. This instrument is one of the six health inventories 


в Eight-Year Study of the Progressive Education Association, Logical Reasoning 
Test. Published by Educational Testing Service, 1950. 
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developed by the Cooperative Study in General Education for use 
from Grades 9 to 16. Illustrations from the Health Attitudes and 
Health Interests inventories appear in Chapter тт and a brief dis- 
cussion of the entire series appears in Chapter 21. 


Excerpts from Health Activities Inventory * 
Port | 
Directions: The following list of 100 items represents some of the activities in which people engage or 


which may affect them. Read each item, and on the appropriate line of the answer sheet blacken the space 
under 


R ifitis a practice which you follow regularly or if it happens to you frequently, 
O if it is a practice which you engage in occasionally or if it happens to you occasionally, 


N if it is a practice which you never engage in or if it never happens to you. 


1. Squeeze a blister to remove its watery content. 
2. Treat skin disorders (such as acne) with common drugstore preparations not prescribed by a physician. 
3. Pick blackheads, pimples, etc., with a needle or some other sharp object. 
4. Remove moles or warts yourself. 
5. Usea salve or a liquid for bleaching the skin. 
Port I 


After you have answered these 100 items according to the directions for Part I, go back to the first item 
and, starting with space 101 on the answer sheet, mark the first 70 items again according to the following 
directions: Blacken the space under 


S if you believe the practice to be a sound one; that is, if you believe the practice is substantiated 


by science principles; 
D if you are doubtful about the soundness of the practice; that is, if you are not sure whether the 
practice can be substantiated or is contradicted by science principles; 


U if you believe the practice to be an unsound one; that is, contrary to science principles. 


The Inventory of Personal-Social Relationships is illustrated 
herewith by the directions and a few sample items from Part I, on 
activities and interests. Part II deals in quite similar manner with 
the students’ concerns and difficulties. This inventory, for use from 
Grade 9 through the college years, is designed to measure develop- 
ment of the individual student in the area of personal-social rela- 
tionships. Brouwer outlined steps for summarizing group results 
and analyzing an individual student's responses on the parent 
instrument, only slightly revised in the present edition.* 


7 Cooperative Study in General Education, Health Activities, Health Inventory 
No. I. Published: by Educational Testing Service, 1050. , 

8 Paul J. Brouwer, Student Personnel Services in General Education. American 
Council on Education, Washington, D. C., 1949. p. 190-204. 
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Excerpt from Inventory of Personal-Social Relationships ? 


PART. |. ACTIVITIES AND INTERESTS 


Directions: 
Mark your responses as follows in the A U D column at the left-hand side of the page: 


(R)U D ~ Draw a circle around the A if the item represents an activity in which you par 
ticipate or something -you do, either occasionally or frequently. 


AU ©) = Draw a circle around the D if the item represents something you rarely or never'do 
AND toward which you are more or less indifferent, а 


DO NOT OMIT ANY ITEMS. IF YOU ARE DOUBTFUL ABOUT YOUR RESPONSE, MAKE 
THE BEST QUICK DECISION YOU CAN. DO NOT PAUSE TOO LONG ON ANY ONE 


STATEMENT. 


s AUD 1. Going to aéstüdent “hangout” with friends for a Coke, a. snack, etc. 
ki | A UD 2. Singing in a glee club, chorous, quartet, or similar musical group, 
sk | A U D A Playing on an organized athletic team (varsity or intramural). 

f | A U D , 4 Attending student-faculty teas. 


sk|A U D 5. Going to dances. 


3 OTHER EVALUATIVE TOOLS 


Evaluative tests of the types represented in the preceding section 
are of more recent origin than are most of the evaluative tools to 
receive consideration here. The evaluative significance of such tools 
as the pupil profile and pupil progress charts, the cumulative record, 
the report card, and the class analysis chart lies much more in the 
broadened conceptions employed in their construction and use than 
in their uniqueness. Each of these tools is briefly discussed below. 


Pupil profile chart 


Pupil profile charts are provided with many standardized tests of 
general achievement and many diagnostic tests in order to show 
differences in achievement levels graphically. The charts, frequently 
providing places for various part and total scores and their graphical 


9 Cooperative Study in General Education, Inventory of Personal-Social Rela- 
lionships. Published by Educational Testing Service, 1950. 
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representation, often appear on the front covers of the test booklets 
or on pupil answer sheets. Tt is often recommended by test authors 
that the answer sheet or the cover of the booklet, which also carry 
such information as the pupil's name, the name and form of the test 
used, and the date on which it was given, be filed in the pupil's 
cumulative record folder or elsewhere for future reference and use. 
The following illustration of a pupil profile chart is representative 
of the charts provided with most general achievement tests. The 
illustration shows the method of using a profile for results from a 


А, " DIAGNOSTIC PROFILE IChort Student's Scores Неге) 
m xv t uf Grade Placement 

me " eo us Saba aka baat bar aaa a aani baia aaa aad 

1 | | 40 50 60 70 80 90 100 11 
at д EE ар. yee UE SRS Gira алы л ОШЕН „ н 
be B. Symbols ond Rules - 15 É... 
& C. Numbers & Equations 10 @_...... 
z ро. Problems ---.-- Па 
m TOTAL (A+B+C+D) 55 2 


E. Addition .-..... 
F. Subtraction - - - - - 
G. Multiplication 

Н. Division - ss jeu 


4. ARITHMETIC 
FUNDAMENTALS. 


TOTAL (Е+Е+С+Н) 
TOTAL ARITHMETIC. 135 [7H [SO] 201520 чою so. о лю зо о о оо 
смете | 40 50 60 70- 80 90 10.0 11 


Percentile Ronk 


INTELL. б. 
ACTUAL G. 
CHRON. G.P 7. 


1 
d 
Fig. 15. Sample profile chart for the California Arithmetic Test 1° 


single test that is part of a general achievement battery, It provides 
for the listing of scorés on the achievement test and for a profile 
showing achievement levels on the total test, on its two major areas, 
and on its eight parts. Relative strengths and weaknesses appear 
graphically for ready observation by the teacher. 


Pupil progress chart 


Evidence of pupil progress as measured by achievement tests over 
a period of years can be presented graphically by the use of a pupil 
profile chart. An illustration of this procedure is Shown in Figure 


19 Ernest W. Tiegs and Willis W. Clark, Manual for California Arithmetic Test, 
Intermediate, California Test Bureau, Los Angeles, 1951. ”. 4. 
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11 Richard D. Allen and others, Supervisor’s Manual: Metropolitan Achievement 
Tests. World Book Co., Yonkers, N. Y., 1935. Р. 32. 
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16 for a pupil tested for three successive years by the Metropolitan 
Achievement Tests. The pupil’s average achievement is shown by 
the broken horizontal lines and his school grade level by the heavy 
horizontal lines. His mental and chronological ages at the times he 
was tested are shown under “Age,” the years and months of mental 
age being indicated by circles. The trends of each profile indicate 
the pupil’s relative strengths and weaknesses at a given time, whereas 
the vertical differences between successive profiles are indicative of 
his growth in the subject areas tested, 


Cumulative pupil record 


An adequate system of cumulative pupil records is almost essential 
if the program of a school is to be effective. Many schools apparently 
keep no cumulative pupil records other than of background facts con- 
cerning the pupil and his parents and of scholastic success, but other 
schools have comprehensive and even elaborate systems of cumula- 
tive records that provide for the recording of a wide variety of data 
concerning each pupil in a cumulative record folder. Many types of 
variations between these extremes are also found. 

No attempt is made here to catalog all of the types of information 
for which cumulative records should make provision. It is sufficient 
to indicate that the records should contain information about the 
pupil’s family background and environment, personal history, health, 
personality, intelligence, special abilities, school progress, scholarship, 
achievement test performances, extra-curricular activities, employ- 
ment, educational plans, and vocational ambitions. Some record 
systems provide for the recording of data on all or most of these 
points on a record card or folder, and also for the filing of certain 
types of other data, such as test profiles and scores, anecdotal records, 
case studies, and reports of action taken on special problems, in the 
cumulative record folder, 


ehensive picture of a pupil can 
that can be recorded on such a 
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It is impossible here to discuss at all adequately the values and 
uses of the cumulative record in pupil guidance and adjustment. 
However, it should be apparent that the mere availability of such a 
variety of information for all pupils in a school as can be recorded on 
the pictured type of cumulative record is of great value. Such 
records are useful to administrators, to guidance workers, and to 
teachers as a basis for careful analyses in cases of maladjustment or 
disciplinary difficulties, and on others of the many occasions requir- 
ing or at least making desirable comprehensive information about 
individual pupils. 


Pupil report card 


Although the -traditional report card may be considered to be 
an evaluative tool, the better modern report cards merit that designa- 
tion much more definitely. The report card presents to the pupil and 
his parents a series of evaluations of his scholastic success and fre- 
quently of other aspects of his school performance. Because report 
cards are so widely known and because their organization and 
content differ so greatly, it seems neither desirable nor feasible to 
discuss further or to illustrate this evaluative tool here. 


Class analysis chart 


Class analysis charts are valuable tools in the summarization of 
results from testing. Although such charts as are provided with 
standardized tests vary greatly, they usually provide a means of 


‘showing median achievement for the class or pupil group and the 


position of each pupil in the group in relation to age norms, grade 
norms, or both, for elementary-school tests. High-school tests more 
frequently provide for the graphical representation of median group 
performance and individual pupil status in relation to grade norms 
or percentile norms. The following illustration and discussion are 
based on a class analysis chart which is rather typical of those usually 
provided with general achievement test batteries. 

The chart reproduced on pages 232 and 233 gives an analysis of the 
results from the use of the Metropolitan Achievement Tests with a 
class of 22 pupils in the second month of the sixth grade. Median class 
standing is shown by the crosses for achievement on the entire test 
and in the various subjects for which the test provides, as well as for 
intelligence quotients and chronological and mental ages. The line 
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Fig. 17. Sample American Council on Education 


connecting the crosses provides a profile of median achievement. 
Individual pupils are designated by identifying numbers placed at 
positions on the chart representing their scores, The distributions 
of intelligence quotients in column B and of mental ages in column 
E are related to but not really a part of the chart proper. At the 
bottom of the chart is shown the median achievement in terms of 
grade equivalents. 

Among the significant interpretations possible from this type of 
chart are those involving comparisons of median grade equivalents 
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Cumulative Record for Elementary and Secondary Schools +? 


for the various tests, those showing the range of pupil ability on 
the various tests, and those identifying the individual pupils who 
achieve at high levels, who achieve at low levels, or who show great 
individual variability on the different tests. These are only some of 
the more obvious types of interpretations of data on a class analysis 
chart, but these few perhaps illustrate the uses to which such an 
evaluative tool can advantageously be put. 


12 Cumulative Record for Elementary and Secondary Schools. American Council 
on Education. 
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Fig. 18. Sample class analysis chart for 


Д EVALUATIVE TECHNIQUES 


To be distinguished from tests and tools used in pupil evaluation 
are two major techniques used for the same general purpose: (1) 
the interview and (2) the questionnaire, The questionnaire is con- 
sidered here, even though it is per se an instrument, because question- 
naires typically are constructed to meet each need as it arises and 
are not found available in generalized published form. 
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the Metropolitan Achievement Tests +° 


The interview 


The interview deserves only brief attention here, for the teacher 
is not directly concerned with it in its formal sense. The interview 


may, however, be informal and it may deal only with the areas of the 


757. 
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Yonker: 
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13 Gertrude H. Hildreth, Manual for Interpreting: Metropolitan Achievement 
‚ 1948. p. 56 


Tests, World Book Co. 
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child's interests, needs, and background about which the teacher 
needs information. Even in such informal uses of the interview as 
may be of concern to the teacher, it is essential for best results that 
rapport be established between the teacher and the child. A fright- 
ened or an antagonistic pupil is not a good subject for an interview. 
Therefore, the teacher should give the same type of attention to the 
establishment of rapport that is necessary prior to the administra- 
tion of individual intelligence tests. Pupils should not be questioned 
on many types of issues in the presence of a third party, for their 
responses might then be less frank and spontaneous than if they 
were questioned in privacy. 

In this broad sense, the interview is widely useful and flexible. 
However, it extends possibilities to the teacher for learning more 
about his pupils and consequently aids him in attempting to effect 
the best adjustment possible for each pupil. 


The questionnaire 


Questionnaires have been very widely used, and all too frequently 
misused, in the attempt to evaluate some aspect of the school pro- 
gram and to measure certain intangible types of pupil behavior. 
Questions of fact appear in such instruments when they are used in 
obtaining simple, factual information or when they are used, for 
example, in obtaining information concerning pupil activities. Certain 
intangible tastes and preferences, primarily in the areas of attitudes 
and interests, are also measurable by the use of questionnaires. 
Adjustment inventories also typically make use of this technique. 
Part I of the Health Activities Inventory, treated in a preceding 
section of this chapter, and the several attitudes, interests, and 
adjustment inventories dealt with in Chapter rr illustrate the use 
of this technique in the evaluation of pupil behavior. 


5 EVALUATIVE TOOLS AND TECHNIQUES IN THE CLASSROOM 


It should be apparent from the preceding discussion and illustra- 
tions that the equipment of the classroom teacher may well include 
evaluative tests, tools, and techniques as well as the more traditional 
types of measuring instruments and techniques. As has been made 
clear in Chapters 7 and 8, the classroom teacher very properly may 
construct informal objective tests and performance tests to meet his 
particular needs and to supplement standardized achievement tests 
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and scales. He can also participate in the cooperative planning and 
development of most of the evaluative instruments and techniques 
treated in this chapter. However, the development of some types of 
test units, such as those illustrated for the Interpretation of Data 
Test and the Logical Reasoning Test, involves principles of test con- 
struction going somewhat beyond those presented in Chapters 5 and 
7 and entails a considerable amount of experience on the part of the 
test maker. As Ebel commented, “Skilled, experienced item writers 
find it difficult to construct interpretive exercises of high quality.'* 

The interpretation of evaluation results demands more insight and 
understanding than is ordinarily required for handling results from 
more traditional objective tests, and as yet few guides other than 
those provided with the specific instruments themselves have been 
set up as aids to users. Moreover, the broad, integrative nature of 
evaluation and the wide variety of instruments and techniques pre- 
clude definite limitations on the use of results. Therefore, the 
considered judgment of the evaluator not only of the direct results 
of evaluation but also of all other sources of information about 
individual pupils should be exercised in drawing conclusions and 
in deciding upon any indicated courses of action. 


Topics for Discussion 


i. How are evaluative tests distinguishable from other paper-and-pencil 
achievement tests? 

2. For what grade levels are evaluative tests most often provided at 
present? 

3. How do interpretive tests differ from more traditional tests in pur- 
poses and testing techniques? 

4. What are some major characteristics of tests of practices and activi- 
ties? 

5. How can a pupil profile chart serve as a pupil progress chart? 

6. For what types of information should a cumulative pupil record 
make provision? 

7. Describe a typical class analysis chart and discuss its uses by the 
teacher. 

8. Briefly discuss the interview and the questionnaire as evaluative 
techniques. 

9. How can the teacher make effective use of evaluative tests, tools, and 
techniques in the classroom? 

14 Robert L. Ebel, “Writing the Test Item.” Educational Measurement. American 
Council on Education, Washington, D. C., 1951. p. 246. 
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Using Intelligence and 
Aptitude Tests 


THE ASPECTS of intelligence, intelligence testing, and the use of 
intelligence test results that are given major attention in this chapter 
are as follows: 


Definitions of intelligence. 

Theories concerning the nature of intelligence. 
Individual and group tests of intelligence. 
Specific and group-factor intelligence tests. 
Performance tests of intelligence. 

Scores derived from intelligence tests. 
Distribution of intelligence, 

Procedures in intelligence testing. 

Classroom uses of intelligence test results, 


rH On Bp ow Dp 


It is important for the student to be conversant with the nature of 
intelligence and with techniques for its measurement. It is also 
important that he be able to obtain and use at least the major types 
of derived scores in furnishing guidance of various types to his 
pupils. This chapter discusses the theory and measurement of intel- 
ligence and the applied aspects of intelligence and intelligence test- 
ing. 

Workers in the field of mental abilities are far from agreement 
both on the correct terminology to use in discussing mental abilities 
and on the exact nature of the ability or abilities to which the terms 
apply. It is therefore very difficult to prepare a brief treatment of 
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intelligence and intelligence testing. The discussions of intelligence 
in this chapter are based on what the authors believe to be the best 
modern terminology in this field. The reader will doubtless encounter 
instances, however, in which test titles and references will not be 
completely in harmony with the usage to be followed. 


1 NATURE OF INTELLIGENCE 


The exact nature of the combination of abilities known as in- 
telligence is not well understood. However, it is definitely known 
that individuals differ widely in the amount, and perhaps the quality, 
of it they possess, and that within limits it can be measured. 


Definitions of general intelligence 


Many definitions of intelligence have been given. The following 
list presents some of the ones that are most commonly quoted: * 


Colvin: *An individual possesses intelligence in so far as he has 
learned, or can learn to adjust himself to his environment." 

Dearborn: *...the capacity to learn or to profit by experience. . ." 

Henmon: “Intelligence...involves two factors—the capacity for 
knowledge and knowledge possessed." 

Pintner: “I have always thought of intelligence as the ability of the 
individual to adapt himself adequately to relatively new situations 
in life.” 

Terman: “An individual is intelligent in proportion as he is able to 
carry on abstract thinking.” 

Thorndike: “We may... define intellect, in general, as the power 
of good responses from the point of view of truth or fact.” 

Woodrow: “It is an acquiring-capacity.” 


Additional definitions taken from Freeman ? are: 


Binet: *...the tendency of thought to take and maintain a definite 
direction, the capacity to make adaptations for the purpose of 
attaining the desired end, and the power of self-criticism.” 

Burt: *... the power of readjustment to relatively novel situations. . .” 

Stern: *...the general mental adaptability to new problems and 
conditions of life.” 


1 Symposium, “Intelligence and Its Measurement.” Journal of Educational Psy- 


chology, 12:123-47, 195-216; March and April 1921. 
2 Frank N. Freeman, Mental Tests: Their History, Principles and Applications, 


Revised edition. Houghton Mifflin Co., Boston, 1939. p. 248. 


240 THE SECONDARY SCHOOL 


The above definitions seem to fall into at least three patterns: 
(1) the rather formal definitions stressing mainly what have been 
called the higher mental powers, (2) the definitions emphasizing 
ability to learn, and (3) the definitions placing major emphasis upon 
adaptability. It is felt that the last type of definition particularly, 
by which intelligence is conceived as the capacity or power of the 
individual to adapt himself to his environment and to new situations, 
is the most meaningful for the purposes of the teacher. However, 
the fact that capacity to learn and ability to think in abstract terms 
are both evidences of intelligence should not be overlooked. 

Freeman ° listed three concepts of intelligence—the organic, the 
social, and the psychological or behavioristic. He considered that the 
third is the only one that is of direct concern to intelligence testers 
and called the others factors of intelligence. The psychological or 
behavioristic concept accepts as intelligence the types of behavior 
that are measured by intelligence tests. Intelligence has been defined 
as “that which intelligence tests measure." This definition is in line 
with Freeman's psychological or behavioristic concept. The definition 
has meaning, for it implies that intelligence, although it has not 
yet been adequately defined or delimited, conditions the individual's 
behavior and that it is, therefore, through observation and measure- 
ment of his behavior that his intelligence can be estimated. 

Stoddard also approached intelligence operationally, although 
he stated that the elements have not been represented, unless ac- 
cidentally, in existing tests, and defined it as: 


the ability to undertake activities that are characterized by (т) 
difficulty, (2) complexity, (3) abstractness, (4) economy, (5) adaptive- 
ness to a goal, (6) social value, and ( 7) the emergence of originals, and 
to maintain such activities under conditions that demand a concentra- 
tion of energy and a resistance to emotional forces.4 


Theories concerning intelligence 


Theories concerning the nature of ability go back as far as pro- 
nouncements of the early philosophers. However, only three of the 
most important theories of the last century are presented here. Two 


з Frank N. Freeman, “The Meaning of Intelligence," Intelligence: Its Nature and 
Nurture, Thirty-Ninth Yearbook of the National Society for the Study of Education, 
Part I. Public School Publishing Co., Bloomington, Ill., 1940. p. 11-20. 

+ George D. Stoddard, The Meaning of Intelligence. Macmillan Co., New York, 
1043. D. 4. 
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of them are important to the user of intelligence tests because of 
the manner in which they have modified and are now modifying 
teaching and testing practices. 

The faculty theory. According to the faculty theory, intelligence 
consists of a number of relatively independent and largely cor- 
related and specialized abilities of various types, such as memory, 
imagination, honesty, and language ability, to name only a few. The 
closely related theory of formal discipline maintained that these 
faculties could be developed individually by means of general mental 
exercise. However, when the theory of formal discipline was dis- 
proved and the transfer of training concept directed attention to the 
fact that such faculties as those named above are neither psychologi- 
cal entities nor subject to general training, the faculty theory was 
forced into the discard as an explanation of mental abilities. 

The two-factor theory. Spearman first presented his two-factor 
theory in 1904." He proposed a general factor, or g, which enters 
into all types of performance, and many specific factors, called s, 
which combine with g to determine total activity. Basing his theory 
on technical statistical treatments of data, Spearman later added a 
third type of factor, called group factors, which represent the over- 
lap among s factors.? Thus, according to his theory, a g or general 
factor, which might be called energy, group factors, such as number 
ability and mechanical ability, and many s or specific factors con- 
stitute ability. 

The multi-factor theory. Spearman's work may be considered the 
forerunner of the present factor analysis approach to the nature of 
mental ability. Among the factor analysts is Thurstone, who isolated 
the seven factors of perceptual, number, verbal, spatial, memory, 
inductive reasoning, and deductive reasoning,’ which he called pri- 
mary mental abilities. These primary abilities might appear on the 
“surface to relate closely to the “faculties” of the early psychologies, 
but the factors emerging from the work of Thurstone and’ other 
exponents of the multi-factor theory not only are substantiated by 
correlational relationships but also appear to have sound psychologi- 
cal evidence to support their existence. 


5 C. Spearman, “ ‘General Intelligence’ Objectively Determined and Measured.” 
American Journal of Psychology, 15:201-93; 1904. 

6 C. Spearman, The Abilities of Man. Macmillan Co., New York, 1927. p. 82. 

т Louis L. Thurstone, Primary Mental Abilities. Psychometric Monograph Series, 
No. 1. University of Chicago Press, Chicago, 1938. 
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2 MEASUREMENT OF INTELLIGENCE 


Indirect measurement of intelligence 


For practical purposes, intelligence has been defined in a preceding 
section of this chapter as the power to learn or to adapt to new 
situations. These definitions perhaps suggest that this type of ability 
is subject to evaluation in a rather direct manner. Such is not the 
case, however, for ability to learn can only be inferred from the fact 
that learning has occurred in a test situation. Since intelligence itself 
cannot be measured, test makers can only measure the performance 
of tasks the successful completion of which is generally believed to 
be dependent upon intelligence. The value of the intelligence test 
lies in the fact that it affords an objective basis for this inference. 
It samples widely from the fields of learning resulting from experi- 
ences assumed to be common to all persons subjected to the test. 
The pupil's capacity to learn or to adapt to new situations is 
determined by summing up his reactions to the items of the test. 

There is apparently no way of determining very precisely which 
particular fields of human interest or ability should be sampled in 
the attempt to secure this cross section of mental activity. It is 
important that the sampling be sufficiently diverse and representa- 
tive to permit the securing of an estimate in the nature of an 
average that will not penalize a person because he may not have had 
this or that specific experience. Briefly, the measure or average 
obtained from a test that does sample representative reactions is 
taken to be indicative of one's ability to learn or of one's adaptability. 
Roughly, it is assumed that what an individual has learned is 
indicative of his potentialities for learning. Differences in intelligence 
test scores are probably sufficiently accurate, rough as they are, to 
indicate such differences in mental ability. 


Factual and skill content of intelligence tests 


It has been contended, and not without justification, that intel- 
ligence tests do not differ appreciably from achievement tests, inas- 
much as both are founded upon the measurement of knowledges and 
skills that have largely been learned, Obviously, a test of ability to 
learn must have some type of content, Intelligence tests admittedly 
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contain factual and skill materials. Such tests attempt to measure 
abilities to see relationships, to draw reasoned inferences, to manipu- 
late, to compare, to contrast, and otherwise to handle materials 
which themselves are so commonly known and at such low difficulty 
levels that all persons who have had any but the most exceptional 
environmental backgrounds should know the necessary facts and have 
the necessary skills for understanding and taking, although not 
necessarily for succeeding upon, the tests. To contend that intel- 
ligence tests have been completely successful in eliminating the 
significance of the factual and skill content would be foolhardy and 
contrary to available evidence. 

A few intelligence tests contain vocabulary sections requiring 
considerable knowledge of word meanings for successful performance. 
Several tests also directly measure knowledges in widely studied 
areas. The justification for the inclusion of factual items in an 
intelligence test is that opportunities are supposed to be similar for 
all persons experiencing a normal environment to learn such facts 
and that the degree to which different persons do so is partial evi- 
dence concerning their intellectual levels. More frequently, however, 
intelligence tests attempt, but with varying degrees of success, to 
minimize the influence of environment upon an individual's test 
performance. 

Kelley * stated that general intelligence tests and achievement 
tests overlap to the degree indicated by a correlation coefficient of 
.90. In general, coefficients of .4o to .60 are found between tested 
intelligence and academic achievement, but higher degrees of relation- 
ship are sometimes found. When such correlations approach .7o or 
.80, the intelligence test is looked upon with suspicion by some and 
may be considered a general scholastic achievement test rather than 
an intelligence test.? 

Cattell, believing that general intelligence tests measure acquired 
knowledges and skills to a considerable degree and also that they 
frequently test abilities of too specific a nature, devised a culture- 
free test.!° The test items, largely pictorial rather than verbal, were 
chosen to measure abilities to run pencil mazes, to build up series, 

8 Truman L. Kelley, Interpretation of Educational Measurements. World Book 
Со., Yonkers, N. Y., 1927. р. 208. 

? Paul L. Boynton, “Intelligence.” Encyclopedia of Educational Research. Mac- 


millan Co., New York, 1941. p. 630. 
10 Raymond B. Cattell, “A Culture-Free Intelligence Test I.” Journal of Educa- 


tional Psychology, 31:161-79; March 1940. 
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to classify, and to determine relationships of varying degrees of 
complexity. The content was so selected as to be largely independent 
of acquired or learned meaning, so that the test presumably can be 
given with equal fairness to persons reared in any civilized society 
and even, by pantomime, to primitive peoples. 

The teacher should probably admit that intelligence tests in 
varying degrees test factual knowledges and skills which not all 
pupils have had equal opportunities to learn, but he is probably 
justified in the belief that they are at a minimum in at least the 
„better tests and that the environments of pupils attending the 
typical school are sufficiently similar that all have had approximately 
equal opportunities to acquire such facts and skills as are included 
in the tests. 


3 GENERAL INTELLIGENCE TESTS 


General intelligence tests, both individual and group, are dis- 
cussed and illustrated below so that the student may obtain a more 
complete understanding of the characteristics and representative 
content of these important instruments for the measurement of 
general mental ability. 


Individual scales of general intelligence 


Individual intelligence examinations constitute the most accurate 
devices for the measurement of intelligence. The length of the test, 
the wide variety of reactions called for, the fact that the subject 
receives his instructions personally from the examiner, the fact that 
the examiner is afforded an opportunity to observe each reaction 
made by the subject, and the careful Standardization of procedures 
for administering the test and Scoring the subject's reactions all 
contribute to the high degree of accuracy. The full time of an 
examiner is required for each pupil tested. The examiner must be a 
person who is more capable and efficient in test administration than 
is the typical teacher. Е urthermore, he must be one who has had 
extensive training and experience in giving individual intelligence 
tests. 

Individual intelligence tests are largely patterned upon the Binet- 
Simon tests brought out in France from 1905 to rgrr. American 
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adaptations and revisions were published by Goddard in 1911, Kuhl- 
mann in 1912, Terman in 1916, Herring in 1922, and Terman and 
Merrill in 1937. The Terman and Merrill New Revised Stanford- 
Binet Tests of Intelligence is today, as was its 1916 predecessor, the 
best known and most widely used individual test of general intel- 
ligence in America. 

The general procedure in administering the New Stanford-Binet 
is quite representative of that of the other revisions mentioned. The 
type of performance tested varies considerably with the different 
exercises, These test elements are presented to the child by means 
of spoken directions. The test should be given in a quiet room where 
there is freedom from distraction. A friendly attitude between 
examiner and subject should be maintained. The examiner is ex- 
pected to make sure that the subject understands what is to be 
done, and in all cases the burden of proof is with the examiner to 
show that the subject has responded in a way that is representative 
of his ability. 

After rapport has been established, i.e., the child has been put at 
ease, the examiner starts the test with materials at a scale level on 
which the subject is likely to succeed with some effort. If he is suc- 
cessful on all tests at this level, the examiner, assuming that he could 
pass all tests at lower levels, passes on to the higher levels and con- 
tinues on through the scale until the subject fails all tests at one age 
level. In effect the child has been tested over the entire scale, for his 
success on all tests at one age level makes almost certain that he 
could pass all tests at lower levels and his failure on all tests of an- 
other, and higher, age level indicates with essential certainty that he 
could go no higher on the scale. The child's mental age is deter- 
mined by giving him credit for the number of years below the level 
on which he passes all tests and adding to this amount the years and 
months of credit assigned to the higher level tests he succeeds in 
passing. 

It is not feasible here to reproduce more than a few sample test 
elements, but the two following samples from the New Stanford- 
Binet, chosen from those most easy to reproduce in limited space, will 
give the student some idea of the nature of the test. 


11Lewis M. Terman and Maud A. Merrill, Measuring Intelligence. Houghton 
Mifflin Co., Boston, 1937. p. 63. 
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Year 111-6, Form L, Test 3, Comparison of Sticks '? 


Comparison of Sticks 


Material: Match sticks, cut to 2-inch and 2%-inch lengths. 

Procedure: Place the two sticks on the table before the child in the 
positions indicated below and about an inch apart. Say, “Which stick 
is longer?” “Put your finger on the long one.” Give three trials, alter- 
nating the relative positions of the long and the short sticks. In case one 
of the first three trials is failed, give three additional trials, continuing 
to alternate the positions of the sticks. 


(a) (b) (c) . — 


Score: 3 of 3 or 5 of 6. 


Superior Adult 1, Form L, Test 2, Enclosed Box Problem +° 
Enclosed Box Problem 


Material: Any small cardboard box. 
Procedure: Show S. a box and say: 


(a) “Listen carefully. Let’s suppose that this box has 2 smaller boxes 
inside it, and each one of the smaller boxes contains a little tiny box. 
How many boxes are there altogether, counting the big one?” 

(b) “Now let's suppose that this box has 2 smaller boxes inside it and 
{ш (ok of the smaller boxes contains 2 tiny boxes. How many alto- 
gether 

(c) *Now.suppose that this box has 3 smaller boxes inside it and that 
each of the smaller boxes contains 3 tiny boxes. How many boxes are 
Lhere altogether?" 

(d) “Now suppose that this box has 4 smaller boxes inside it and 
that each of the smaller boxes contains 4 tiny boxes. How many are there 
altogether?" 


Score: 3 plus. 


The lists of test titles 1* at several age levels of the Form L Stan- 
ford-Binet between Year II and the Superior Adult ПІ, which 
represent the bottom and top of the Scale, will indicate the variety 
of abilities tested, the scalar arrangement of tests from easy to diffi- 

12 Ibid, p. 84. 


13 Ibid. p. 125. 
14 Ibid. p. 75-132. 
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cult, and the duplication at different age levels of similar types of 
test situations at varying levels of difficulty. 


Year II 


Three-Hole Form Board 
Identifying Objects by Name 
Identifying Parts of the Body 
Block Building: Tower 
Picture Vocabulary 

Word Combinations 


Year V 


Picture Completion: Man 
Paper Folding: Triangle 
Definitions 

Copying a Square 
Memory for Sentences II 
Counting Four Objects 


Year VIII 


Vocabulary 

Memory for Stories: The Wet Fall 
Verbal Absurdities 1 

Similarities and Differences 
Comprehension IV 

. Memory for Sentences III 


Year XII 


Vocabulary 

Verbal Absurdities II 
Response to Pictures II 
Repeating 5 Digits Reversed 
Abstract Words II 

Minkus Completion 


Average Adult 


Vocabulary 

Codes 

Differences between Abstract Words 
Arithmetical Reasoning 

Proverbs I 

Ingenuity 

Memory for Sentences V 
Reconciliation of Opposites 
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Superior Adult IIT 
Vocabulary 
Orientation: Direction II 
Opposite Analogies II 
Paper Cutting II 
Reasoning 
Repeating 9 Digits 


Group tests of general intelligence 


Group intelligence tests originated in America during World War 
I. The Army Alpha and Army Beta tests, the latter really a per- 
formance scale, were developed for use in selecting army recruits 
for officers’ training and for other positions requiring high intel- 
ligence. Shortly after the war, Otis, Terman, and others brought out 
group tests devised for use in the schools, and many such tests were 
published between 1918 and 1925. Approximately ten years then 
elapsed during which few new group tests of general intelligence 
made their appearance. The Army General Classification Test, the 
Aviation Cadet Qualifying Examination, and the Navy General 
Classification Test of World War II are representative of the recent 
counterparts of Army Alpha. A number of revisions of earlier tests 
and of new tests for civilian use have also been published since 
1935. 

Space limitations prevent the use of illustrations from more than 
a few intelligence tests and permit only a brief treatment of the 
testing techniques used. No attempt is made to furnish descriptions 
of any of the group tests of general intelligence. Instead, sample 
items of various types representative of testing techniques are shown 
апа briefly commented upon. The only way by which the student can 
become truly familiar with intelligence tests is by examination and 
actual use of them. 

'The accompanying illustration of two of the Kuhlmann-Anderson 
Intelligence Tests shows parts that measure knowledge of the alphabet 
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Excerpts from Kuhlmann-Anderson Intelligence Tests 15 


TEST 27 
ABCDEFGHIJKLMNOPQRSTUVWXYZ 
EXAMPLES: 

The third letter of the alphabet is . . . . . 0 эз . 


The second letter before the sixth letter ...... ШШШ. 


1. The fifth letter of the alphabet is . . . . . . s . . uL 1 

2. The second letter before the last letter is . . . . . . ot конен 2 

3. The third letter before Mis . . . . ... Sein APRI Ett 3 

4. The letter midway between H and N is. . . . . . . шшш... 4 

5. The second letter after the fourth letter is . . . . . . .—— 5 
TEST 35 


Draw a line under the middle one of these three numbers: 3 8 9. . 


Write here. — Мыне a word meaning the opposite of good. 


Draw a line through the middle letter in the longer of these two 
words: Revenge, Assert. Write һеге.....................................а Word of 
five letters meaning the opposite of slow. Write һеге.............................. 
a word which rhymes with Лау and means a part of a week. 
Draw a line after each of these two letters A B making 
the first line half as long as the second. Think what year this is, 
then write here ........................ the digits in the reverse order, the one 
which belongs last coming first. Cross out one digit in each of these 
numbers which does not appear in the other number: 43689, 64378. 


15 F, Kuhlmann and Rose G. Anderson, Kuhlmann-Anderson Intelligence Tests, 


Sixth edition. Published by Personnel Press, Inc., 1952. 
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4 SPECIFIC INTELLIGENCE TESTS 


The two types of specific intelligence tests—aptitude and readi- 
ness—differ primarily on the age and maturity levels of the pupils 
to whom they are given. Whereas aptitude tests presuppose some 
ability to read and compute, readiness tests assume that such skills 
have not yet been acquired. Aptitude tests first made their appear- 
ance nearly forty years ago, but readiness tests have been in use 
mainly since the early thirties. 


/ Aptitude tests 


Aptitude tests are now available for a number of areas of per- 
formance, such as those involved in various occupations in the trades 
and industry, various broad areas of performance commonly dealt 
with in the school, and various narrow areas of performance largely 
unique to the school. The various types of aptitude tests largely 
possess in common the characteristic of testing the individual's 
potentialities in terms of the specific abilities resulting from inheri- 
tance and general experience but of disregarding the abilities result- 
ing from specific training or education. Thus aptitude tests parallel 
intelligence tests, although they are narrower in Scope. 

Teachers and school officers, aside from those engaged in voca- 
tional guidance and placement, are more concerned with aptitudes 
for school subjects and fields of study than with occupational areas. 
Therefore, occupational aptitude tests, sometimes called trade tests, 
will not be discussed intensively in this volume but will receive 
treatment only insofar as some of them are useful in the schools. 

Among the first tests of aptitude to be developed primarily for 
school use were several for mechanical, musical, artistic, and clerical 
abilities. In the academic areas of English, foreign languages, mathe- 
matics, and the sciences, the Тоша Placement Examinations, Aptitude 
Series, published in 192 5, appear to be the pioneer instruments. These 
tests, primarily useful at the college level, were followed by other 
aptitude tests for algebra and geometry, English, the foreign lan- 
guages, mathematics, and the Sciences for secondary-school use. 
The accompanying excerpt from the Jowa Algebra Aptitude Test 
illustrates the number series type of item rather common to aptitude 
tests in mathematics. Tt is apparent that some persons who could 
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perform the necessary arithmetical operations for answering item 5, 
for example, would not do so because they failed to discover the 
“pattern” of the number series. 

The variety of areas of behavior served by aptitude tests makes 
impracticable a comprehensive discussion of such instruments here. 
They will receive consideration in Chapters 15 to 21 by subject fields, 
in parallel with prognostic tests, which, although frequently measur- 
ing the results of training, have somewhat similar uses. Aside from 
tests in the music and art fields, aptitude tests are devised almost 
exclusively for use at the high-school and college levels. 


Excerpt from lowa Algebra Aptitude Test +° 


Part 3. NUMERICAL SERIES 
Time allowance—12 minutes. 
Directions: Each of the following number series is made up according. to some rule. Addition, subtraction, 
multiplication, and division, and various combinations of these processes are used in forming the different 
series. Discover the rule for each example, decide what the next term would be, and write it on the blank 
line following the series. Then place a cross ( X) in the circle directly over the answer that agrees with 
yours. Ifno answer agrees with yours place the X іп ће circle over "Not Given." You will receive no 
credit for a correct answer unless it is marked in the correct answer space. The sample is answered correctly. 


Answers 
Sample 1 2 3 4 5 6 7 89 9 $ 9 tet Grea 
ANSWERS 

o © О о, 

1. 2 4 6 8 10 aes So EON п 12 13 Not Given 
Q © © о 

2. 9 8 7 6 5 ЖЕУ er 5 3 2 Not Given 
© 9 [9] 

$. 1 1 5 5 9 9 c n 12 9 Not Given 
о О о о 

4 2 4 8 16 МЕ ы 4. 48 64 96 Not Given. 
o о 

55 8 п 14 17 = — 5. 20 Я 9 Not Given 


Readiness tests 


Readiness tests, found primarily in reading and arithmetic, are 
largely tests of specific intelligence, for they measure the results of 
inheritance and general training rather than of direct instruction. 
As readiness tests imply by their general designation, they measure 
readiness to undertake a new type of activity that is dependent 
upon the maturation of various physical and mental abilities. They 
may in one sense be considered as aptitude tests at the elementary- 
and even the primary-school levels, where they almost entirely 
occur. 


16 H, A, Greene and A. H. Piper, Jowa Algebra Aptitude Test, Revised edition. 
Published by Bureau of Educational Research and Service, University of Iowa, 1942 
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Tests of this type are usually restricted in applicability to a 
particular subject field. However, the Metropolitan Readiness Tests 
are devised for determining the readiness of a child to learn first- 
grade skills of all types, and consequently are briefly discussed and 
illustrated here. The six parts of the test seem to measure the types 
of abilities used primarily in reading and number work. Tests 4 and 
5, for which the instructions are given orally by the examiner and 
which require few skills in pencil manipulation of any complexity, 
measure respectively ability in visual perception and knowledge of 
number. 


Excerpts from Metropolitan Readiness Tests ** 
TEST 4. MATCHING 


о б 


АО ОК KO NO 


TEST 5. NUMBERS 


H 


à m à 


15 


26: 65 ЛЄ 


17 Gertrude H. Hildreth and Nellie L. Griffiths, Metropolitan Readiness Tests, 
Form S, Published by World Book Co., 1950. 
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5 GROUP-FACTOR TESTS OF INTELLIGENCE 


The factor analysis movement gave rise some twelve or fifteen 
years ago to the first use of group factors of intelligence in testing 
practice. The bi-factor tests made their appearance first, and it 
has been only during the past five or so years that the multi-factor 
tests have passed their experimental stage. Both types of group- 
factor tests may be considered to measure intellectual abilities less 
broad than general intelligence but in major respects broader than 
the areas measured by specific intelligence tests. 


Bi-factor tests of intelligence 


The two factors first represented by distinctive part scores in 
psychological examinations and mental ability tests were variously 
termed linguistic and quantitative in the American Council on. Edu- 
cation Psychological Examination, verbal and non-verbal in the two 
Pintner General Ability Tests, and language and non-language in the 
California Test of Mental Maturity. These pairs of scores appear to 
have rather closely similar meanings despite the differences in 
designations for the part scores. 

The bi-factor test is illustrated by the accompanying excerpts 
from the California Short-Form Test of Mental Maturity. 'Test 2 
and Test 7 are respectively from the non-language and language 
portions of the instrument. Mental ages and intelligence quotients 
can be obtained separately for these two major factors as well as 
for general intelligence. 


Multi-factor tests of intelligence 


Thurstone’s Tests of Primary Mental Abilities were pioneers in 
the multi-factor types of tests, but they were used primarily for 
experimental purposes for some years following their issuance in 
1938. The SRA Primary Mental Abilities Tests and the Chicago 
Tests of Primary Mental Abilities, both prepared by Thurstone, 
have been in use for several years in the measurement of such factors 
of intelligence as verbal-meaning, space, reasoning, memory, number, 
and word-fluency at the age 11 to 17 level. Comparable tests by 
Thurstone for younger children specify similar lists of factors. 


\ 
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Excerpts from California Short-Form Test of Mental Maturity +° 


DIRECTIONS: In each row find the drawing that is o different view of the first drawing, 
Mark its number as you are told 
TEST 2 


DIRECTIONS: Mark as you are told the number of the word that means the same or about 
the same as the first word 


TEST 7. є 120. invariably 1 probably 2 seldom 
Salways 4 motionless ges 

H. blossom 1 tree ? vine 121. detect remove  ?discover 
3 flower 4 garden cá 3 overtake 4 apply ee 

96. strange 1real ? tell 122. reluctantly ! gladly ? instantly 
3 certain ‘unknown ——96 3 certainly 4 unwillingly ——122 

97. reply 1 news ? answer 123. inefficient 1 ипгшу ? prudent 
.. 5 note *open mU 3 incompetent ? inevitable = 

98. liberty 1 benefit 2 seize 124. facetious ‘active 2 fragile 
3 freedom 5 aid — 98 Shumorous ‘inventive ——12* 

99. assist 1 consent 2 help 125. ambiguous hard 2 doubtful 
3 agree *overlook —— 99 3 responsible *confident, ——125 


The accompanying illustration from four of the Differential 
Aplitude Tests show some of the techniques used in multi-factor 
tests. The verbal reasoning test employs an analogy type of item in 
which a numbered response is used to fill the first blank and a 
lettered response is required to complete the analogy. In the abstract 
reasoning test the appropriate lettered response is selected to carry 
on the progression established in the four left-hand figures. The 
problem in the space relations test is to select the lettered response 
that represents the three-dimensional figure resulting when the 
left-hand figure is folded and assembled. In the mechanical reasoning 
test the nature of the problem is self-evident. 


18 Elizabeth T. Sullivan, Willis W. Clark, and Ernest W. Tiegs, California Short- 


Form Test of Mental Maturity, Intermediate, Published by California Test Bureau, 
1950. 
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Excerpts from Differential Aptitude Tests = 


VERBAL REASONING 


X XO is to street as rd. is to... .. 
1. lo. 2. ma. 3. st. 4. aw. 
А. city B. France C. end D. road 
9. 25 is to cavalry as foot is to..... 
1. horse 2. cemetery 8. votary 4. hiding 
A. yard B. travel С. armory D. infantry 


ABSTRACT REASONING 


CEEE: ле 
telre] eleli] 


SPACE RELATIONS 


MECHANICAL REASONING 


2 


When the top pulley turns in the 
direction shown, which way will 
the lower pulley turn? 

(If either, mark C.) 


19 George K. Bennett, Harold G. Seashore, and Alexander G. Wesman, Differential 
Aptitude Tests, Form A. Copyright by Psychological Corporation, 1947. 
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6 PERFORMANCE TESTS OF INTELLIGENCE AND APTITUDE 


Performance tests require motor or manual rather than verbal 
responses. In their simplest form language is required neither in 
administering the tests nor ih responding to them. With the excep- 
tion of certain form board tests for measuring complex types of 
mechanical aptitude, they are devised mainly for use with very young 
children, with mental defectives, and with persons unable to use 
English with reasonable efficiency. Therefore, their primary purpose 
seems to be the measurement of abilities not requiring language 
proficiency, or the measurement of abilities in certain types of 
persons for whom tests demanding reading and writing are precluded 
by their language handicaps. Both illiterates and persons who can 
read, write, and speak a foreign language with fluency but who are 
deficient in the ability to use English are included in this last group. 


Fig. 19. Tests of the Pintner-Paterson "Long" Performance Scale ?° 


20 Rudolf Pintner and Donald С. Paterson, Pintner-Paterson Performance Scale, 
Long Form. Published by C. H. Stoelting Co. 
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Two types of performance tests may be distinguished—those 
requiring the use of a pencil for marking, but not for writing, and 
those requiring manipulations of various items of testing equipment. 

The Army Beta, and such revisions as the Kellogg-Morton Re- 
vised Beta Examination for use with adults who cannot read, write, 
or perhaps even understand English, illustrate the first type. Direc- 
tions are given by pantomime, and the subjects respond by tracing 
mazes, indicating whether groups of numbers are alike or unlike, and 
supplying missing elements in pictures. 

The second type of test, requiring manipulation of apparatus, 
depends largely upon form boards that are not unlike jigsaw puzzles. 
The accompanying reproduction of the tests comprising the Pintner- 
Paterson *Long" Performance Scale shows the general nature of 
form boards used in the measurement of mental ability. Directions 
are usually given orally by the examiner. The pupil's success is 
measured by time, errors, moves, and other evidences of success or 
failure. Fifteen separate tests, each of which nets a mental age 
score, are included in the scale. The median of these mental ages 
is taken as the pupil’s mental ability measure. 


7 DERIVED RESULTS OF INTELLIGENCE TESTING 


A raw score from a test has little or no meaning unless it can be 
compared in some manner with other similarly obtained and com- 
parable raw scores. This general principle applies to intelligence 
tests as well as to achievement tests. Therefore, it is important that 
the teacher know the meaning of, and the method of obtaining, the 
most common types of derived measures used in the interpretation 
of intelligence test results. As the methods of obtaining the most 
important of the derived scores discussed below are given fully in 
Chapter 13, only general meanings are treated here. 


Mental age (MA) 


Terman defined mental age as “that degree of general mental 
ability which is possessed by the average child of corresponding 
“an index of absolute mental level” 
pment which a child has reached at 
a child has a mental age of ten years 


chronological age," and as 
indicating “the level of develo 
a given time.” 2? For example, 

21 Lewis M. Terman, The Intelligence of School Children. Houghton Miffin Со, 
Boston, 1916. p. 7-8. 
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if his level of mental,development is equal to that of the normal 
child of exactly ten years. Thus if a representative group of pupils, 
all of whom are ten years of age, makes an average score of 45 on 
an intelligence test that is being standardized, any pupil who sub- 
sequently takes this test and earns a score of 45 is said to have a 
mental age of ten years. An average score for each age group is 
established in the same manner. 

The mental age (MA) is a measure of mental level or of mental 
maturity of the individual. Taken alone it tells nothing of how 
relatively bright or dull the child may be, but it does give an indica- 
tion of the level of ability at which the child potentially can work. 
For example, information to the effect that a certain child has a 
mental age of 7-6 does not enable a person to judge whether the child 
is bright, average, or dull. It is only when he knows or at least can 
estimate the child's chronological age that he can draw conclusions 
concerning the child's brightness. 

The mental age should probably be considered a specific rather 
than a general concept. That is, a child does not have just one mental 
age at a given time; he has many.** His mental age, then, depends 
upon the particular test or tests by which it has been determined, 
and such tests may be specific intelligence and group-factor tests as 
well as general intelligence tests, although the latter are the tests 
in the areas of mental ability most commonly providing mental age 
norms. 


Intelligence quotient (IQ) 


When the chronological age (CA), ie. life age in years and 
months, is known for a pupil, and his mental age (MA) has been 
determined from his score on an intelligence test, his intelligence 
quotient (JQ) can be computed. The intelligence quotient is а 
simple method of expressing the relationship between a pupil's 
mental age and his chronological age. To obtain the IQ, a child's 
mental age (in months) is divided by his chronological age (in 
months), the result is multiplied by roo to remove the decimal point, 
and the whole number nearest to the result is taken as his intelligence 
quotient. The formula is: $ 


IQ = тоо MA 


CA 


22 Terman and Merrill, 05. cit. p. 25. 
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If this formula is applied for a child who has a mental age of 
twelve years six months (rso months) when he is ten years five 
months (125 months) of age chronologically, the following is the 
result : 


I= P A opc тоо 139 = 120, 
IO-5 125 


The intelligence quotient is a measure of the pupil’s relative bright- 
ness, If it is assumed that a typical child grows in mentality at the 
same rate as he ages chronologically, it then appears that children 
who have /Qs over тоо are above average and children who have 105 
below тоо аге below average. This is in harmony with the usual 
indication of normal intelligence as being represented by /Qs be- 
tween 9o and rro, for people of normal intelligence center around 
but are not necessatily exactly at the average of intelligence. 
However, as this concept of the average is applicable only in 
terms of the population as a whole and as very few pupil groups 
are average in this sense, the teacher should not generalize this 
statement and make it apply to pupil groups in the school. The ТО 
alone tells nothing about the level of work of which a child is capable, 
for two children of age six and age twelve might both have 705 of 
110 and yet the younger child would be entirely incapable at that 
time of types of performance commonplace to the older child. 

The mental growth curve. The curve of mental growth has long 
been under scrutiny and has been subjected directly and indirectly 
to many research studies by psychologists. However, no completely 
satisfactory unit of mental growth has yet been found. This fact, 
which results from the lack of an absolute zero point of intelligence, 
from the lack of a simple and constant mental growth unit, and for 
other technical reasons, gives rise to a major problem in the measure- 
ment of intelligence for persons beyond their late-middle teens in 
chronological age. In practice, intelligence tests handle this problem 
in various ways, but a common method is to use the individual’s 
actual chronological age in computing the ТО until he attains the age 
of fourteen to eighteen and from that point to assume for purposes of 
computing his intelligence quotient that his chronological age remains 
constant for the remainder of his life. The justification for doing so is 
found in the shape of the mental growth curve. Progressing upward 
very rapidly during early life, and slowing down somewhat during 
childhood and the early teens, it flattens out to almost a horizontal 
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line by the age of sixteen or so. Although Thorndike presented evi- 
dence to show that mental growth continues into the early twenties,?? 
the annual increments or additions beyond the age of sixteen are quite 
small. 

Constancy of the IQ. A heated controversy over the constancy of the 
intelligence quotient has been waged during the last fifteen years. 
Although it has been recognized for many years that the ТО obtained 
by the use of the best modern tests fluctuates within limits because 
the tests are not perfectly reliable, and that major environmental 
changes for an individual may well be reflected in his /Q, rather 
startling evidence was presented some twenty years ago ** to show 
average gains of twenty JỌ points for 600 children who had attended 
preschool for four years. Later and more startling evidence ?* showed 
that children of dull parentage who were placed in foster homes 
shortly after birth had mean intelligence quotients of 116 when 
they were tested a few years later. These and other studies support 
the belief that the intelligence quotient is significantly influenced by 
very favorable environments. 

Although such findings have not been uniformly obtained by experi- 
menters,?? they are supported by other types of experimental evi- 
dence revealing at least the possibility of marked changes in 
intelligence quotients as the result of improved environments.?* 
Stoddard summed up the evidence on inconstancy of the JỌ ?* and 
pointed out Binet’s expression of the belief ?? that the ТО is subject 
to improvement under desirable conditions of stimulation. 


28 Edward L. Thorndike, Elsie О. Bregman, and Ella Woodyard, Adult Learning. 
Macmillan Co., New York, 1928. p. 127. 

24 Beth L. Wellman, “The Effect of Pre-School Attendance on the IQ.” Journal 
of Experimental Education, 1:48-69; September 1932. 

25 Harold M. Skeels, “Mental Development of Children in Foster Homes.” Journal 
of Consulting Psychology, 2:33-43; March-April 1938. 

26 Florence L. Goodenough and Katharine M. Maurer, “The Mental Development 
of Nursery-School Children Compared with That of Non-Nursery-School Children.” 
Intelligence: Its Nature and Nurture, Thirty-Ninth Yearbook of the National 
Society for the Study of Education, Part II. Public School Publishing Co., Bloom- 
ington, Ill.; 1940. p. 161-78. 

27 Percival M. Symonds, “Psychological Tests and Their Uses: Review and Pre- 
view.” Review of Educational Research, 8:217-20; June 1938. 

28 George D. Stoddard, “The IQ: Its Ups and Downs.” Educational Record, 
20:44-57, Supplement No. 12; January 1939. 

29 Alfred Binet, Les Idées Modernes sur les Enfants. Ernest Flammarion, Paris, 
1909. p. 146, 
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The answer to this question may never be known for certain, In 
fact, the ТО itself is under attack and may in time be replaced by a 
more satisfactory measure. However, the vast majority of school 
children do not undergo such radical changes of environment during 
their school careers that the problem is of great practical significance 
to the teacher. Yet, as there are questions concerning motivation, 
emotional adjustment, optimum placement of pupils, and many 
others that bear significantly upon pupil performances not only on 
intelligence tests but also on achievement tests and in scholarship, 
the teacher should at least be aware of this controversial issue and 
some of its implications. 

Social class and the IQ. Results from group intelligence tests have 
tended to show that children from certain socioeconomic groups 
attain higher mean scores than do children from other, and lower, 
socioeconomic groups. For example, rural children typically score 
lower than urban children, southern white pupils regularly score 
lower than northern white pupils, and children from working-class 
homes attain lower average scores than do those from homes at the 
professional and managerial levels. 

Warner, Meeker, and Eells °° showed that the cultural patterns of 
homes at different socioeconomic levels differ greatly. Davis, Havig- 
hurst, and others ?* obtained evidence to show that standard intel- 
ligence tests are not “culture free” but that they reflect the cultural 
biases of the upper-middle-class test constructors. Davis ?? indicated 
that differences of 8 to 12 JQ points for children from six to ten 
years of age and as high as 20 to 23 JQ points for children fourteen 
years of age between low and high socioeconomic groups reflect the 
cultural bias of the tests. He stated that culturally-fair tests used 
experimentally show pupils of low and high socioeconomic status 
to be closely similar in “innate intelligence" or “real intelligence.” 

These findings concerning the influence of culture, or home en- 
vironment, on the JỌ as obtained from standard intelligence tests 
appear not to be in disharmony with the evidence concerning the incon- 
stancy of the JQ. If the results are borne out by more extensive 


30 W, Lloyd Warner, Marchia Meeker, and Kenneth Eells, Social Class in America. 
Science Research Associates, Chicago, 1949. 


31 Ibid. p. 26. ^ 
82 Allison Davis, “Socio-Economic Influences on Learning." Phi Delta Kappan, 


32:253-56; January 1951. 
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research, traditional methods of using the JỌ in pupil guidance may 
well require revision. 

Future of the IQ. It is apparent from the above discussion that the 
intelligence quotient is far from a perfect measure of brightness. It 
appears to be a more accurate measure for the years of middle child- 
hood than for the first years of life or post-adolescent years. Its 
constancy seems to be somewhat in question. The influence of socio- 
economic backgrounds may be significant. These weaknesses and 
others of a more technical nature raise logical questions concerning 
its continued and final acceptance as the best measure of brightness, 
although it is still one of the most satisfactory measures from which 
to predict success in school and is highly useful in pupil guidance. 
The alternative methods discussed below for indicating intelligence 
represent attempts to obtain a more satisfactory measure. 

Freeman, after analyzing the problem carefully, stated that: 


It may be true that the JỌ is more convenient, but it is a question 
whether its inherent ambiguity does not make it better policy to adopt 
the statistically superior standard score and to educate teachers to 
understand and use it.?? 


Another attack on the JQ** recommended that the age-scale 
method of measuring intelligence be abolished, advocated the re- 
placement of the mental age concept by a combination of measures 
from separate tests, and took the stand that the controversy con- 
cerning the constancy of the JQ is largely futile because its constancy 
or inconstancy does not depend upon fundamental issues but upon 
the manner in which tests provide means of obtaining the ТО. 

Although the teacher should certainly understand the nature and 
proper uses of the JỌ, he should also have some realization of its 
limitations, technical though they may be, and should be alert to the 
alternative methods for designating levels of intelligence which have 
been developed and which may be evolved in the future. The presen- 
tation of several alternatives below should take on additional signifi- 
cance in view of the apparent waning of prestige of the ГО. 

33 Frank N. Freeman, Mental Tests: Their History, Principles and Applications, 
Revised edition. Houghton Mifflin Co., Boston, 1939. p. тоз. 


34 M. W. Richardson, “The Logic of Age Scales.” Educational and Psychological 
Measurement, 1:25-34; January 1941. 
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Personal constant (PC) 


Heinis developed the personal constant?* for the purpose of 
obtaining a measure that would be more accurate than the ТО for 
persons of very superior and very inferior intelligence levels. The 
measure, which he called the per cent of average development, but 
which is better known as the personal constant, is intended to give 
quantitative expression to the normal curve of mental growth in 
terms of growth units that have constant meaning at all age levels. 
The PC is computed by converting both the mental age and the 
chronological age to growth units by the use of a table of mental 
growth units? dividing the MA value by the CA value, and mul- 
tiplying by roo. Thus, the PC involves the substitution of growth 
units for МА and СА in the /Q formula. 

Although Kuhlmann recommended that users of the Kuhlmann- 
Anderson Intelligence Tests employ it rather than the ЈО ** and 
Hilden found that the PC fluctuates less than the /Q,°° Cattell found 
the ТО to be definitely more constant for bright children and some- 
what less constant for dull children than is the PC.*? Freeman noted 
that the computation of the personal constant is more time-consum- 
ing than is that of the intelligence quotient, and indicated that the 
evidence now available concerning the values of the PC is incon- 
clusive.*? 


Index of brightness (IB) 


The index of brightness is stated in the same form as the ТО. 
While its meaning is somewhat similar to that of the JQ, it is derived 
in quite a different manner. In this case, the pupil's relative bright- 
ness is expressed as a positive or negative deviation from the norm 
of pupils of his age. The difference between a pupil's score and the. 


35 H, Heinis, “A Personal Constant." Journal of Educational Psychology, 17:163- 
86; March 1926. 

36 Arnold H. Hilden, Table of Percent of Average Development Based on Mental 
Growth Units. Educational Test Bureau, Minneapolis, 1936. 

зт F, Kuhlmann and Rose G. Anderson, Instruction Manual: Kuhlmann-Ander- 
son Intelligence Tests, Fifth edition. Educational Test Bureau, Minneapolis, 1940. 
p. 17. 
38 Arnold Н. Hilden, “А Comparative Study of the Intelligence Quotient and 
Heinis’ Personal Constant." Journal of Applied Psychology, 17:355-75; August 1933. 

39 Psyche Cattell, “The Heitiis Personal Constant as a Substitute for the IQ." 
Journal ој Educational Psychology, 24:221-28; March 1933. 

40 Freeman, оў. cit. p. 296. 
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norm for persons of the same chronological age is added to (if his 
score is above the norm) or subtracted from (if his score is below 
the norm) roo to obtain his index of brightness. Otis, who used the 
measure for his Quick-Scoring Group Tests of Mental Ability, him- 
self stated that the index of brightness has the same significance as 
an intelligence quotient.*! Freeman, however, pointed out that the 
method by which the 7B is derived makes improbable its consistency 
with the 7Q.*? 


Percentile scores 


Percentile scores (also called centile scores) are frequently used to 
indicate a pupil's status in intelligence. This method is used particu- 
larly at the high-school and college levels, for the intelligence 
quotient, as has been pointed out above, is not as meaningful a 
measure for post-adolescent and adult years as it is for periods of 
childhood and adolescence. The percentile score describes a pupil's 
placement in an age or grade group in terms of the percentage of the 
group scoring lower than he does. The American Council on Educa- 
tion Psychological Examinations at both the high-school and college 
levels present norms for the interpretation of scores in terms of 
percentiles for different grade levels. 


Standard scores 


Another type of measure that indicates a pupil’s intelligence level 
in terms of his position within a certain age or grade group is based 
on the arithmetic mean and the standard deviation. Most fre- 
quently called standard scores, they have advantages over such 
relative measures of placement as percentile scores and are thought 
by some *? to be superior to the PC as derived scores of intelligence. 
The Merrill-Palmer Scale of Mental Tests uses standard score norms. 
Terman and Merrill presented tables for the use of research workers 
and other persons in converting 705 obtained on the New Revised 
Stanford-Binet Tests of Intelligence into standard scores.** 


41 Arthur S. Otis, Manual of Directions for Gamma Test: Otis Quick-Scoring 
Mental Ability Tests. World Book Co., Yonkers, N. Y., 1937. p. 4. 

42 Freeman, op. cit. p. 300. 

43 Francis N. Maxfield, “Trends in Intelligence Testing.” Educational Research 
Bulletin, 15:134-41; May 13, 1936. 

15 Terman and Merrill, ор. cit. p. 42. 


INTELLIGENCE AND APTITUDE TESTS 265 


8 DISTRIBUTION OF INTELLIGENCE 


It is important that the teacher know something of the manner in 
which intelligence is distributed if he is to make effective use of 
intelligence test results. The many reports of the distribution of 
intelligence show, however, that no single pattern of the distribution 
of intellectual ability can be expected to apply widely to different 
school situations. Typical groups of school children are not un- 
selected, as might be supposed, but have been affected variously in 
their composition by many selective factors. 

Intelligence can be conceived of both in terms of some such 
measure as the /Q and in terms of descriptions of the types of per- 
formance possible for persons of different intelligence levels. A dis- 
tribution of intelligence quotients for an unselected group of children 
and the general descriptive terms used for different levels are pre- 
sented here as an indication of the general distribution of intelligence. 


TABLE 8. Distribution of intelligence quotients in a normal popula- 
tion ** 


Percentages of 
Classification IQ All Persons 


Near genius or genius 140 and above I 
Very superior 130-139 2.5 
Superior 120-129 8 


Above average IIO-119 16 
Normal or average 90-109 45 
Below average 80-89 16 
Dull or borderline 70-79 8 
Feeble-minded: moron 60-69 2.5 
imbecile, idiot 59 and below I 


Table 8 shows the distribution of intelligence quotients for a 
normal population. Figure 20 presents the same data graphically. 
It will be noted that 45 per cent of the population fall within ten 
IQ points of the average JQ of тоо. On the average, one person in 
each тоо is in the genius or near-genius class and one person in each 


45 Adapted from Terman and Merrill, op. cit., p. 38-41. A standard deviation of 
16.6 is used, in approximate accordance with results from Forms L and M of the 
Stanford-Binet Examination for all age groups between two and eighteen. 
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1oo is in the very low feeble-minded group. About ro to 12 per cent 
of the total may be considered as distinctly superior and ro to 12 
per cent as distinctly inferior. Persons at the highest level of feeble- 
mindedness, i.e., morons, are not uncommon in the lower grades of 
the school. 


бо бо бо 70 Во до тоо по 120 130 140 150 160 
Intelligence Quotients 
Fig. 20. Percentages of persons in a normal population at different 
levels of intelligence 


“9 DERIVED MEASURES RELATING INTELLIGENCE AND 
ACHIEVEMENT 


It has been pointed out in the preceding chapters that intelligence 
tests are designed to measure primarily innate or inherited abilities 
and that achievement tests are intended to measure the results of 
education and experience. In one sense, then, intelligence tests can 
be considered as measuring the capacity to learn or the potentialities 
for achievement and achievement tests can be considered as measur- 
ing what has been learned. It seems natural, therefore, that an effort 
should be made to discover how well the individual is living up to 
his potentialities by comparing his performances on intelligence and 
achievement tests. 

Two general procedures have been used for this purpose—those 
based on quotients and those based on differences, Two specific 
methods are discussed here. Only the first, which is discussed more 
fully, has come into wide use, but the other procedure is briefly 
presented so that the student may better grasp the problems involved 
in a reliable comparison of ability with achievement. As is pointed 
out later, measures of this type are highly questionable in their use 
with individual pupils, and even for use with pupil groups they must 
be interpreted with care and with regard for the many variables that 
condition their use if important pupil adjustments are to be made on 
the basis of the results. 
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Accomplishment. quotient (AQ) 
== т Б а 


The accomplishment quotient or achievement quotient, also some- 
times called the accomplishment or achievement ratio, represents the 
relation between the educational level (EA) and mental maturity 
(MA) or between the relative educational development (EQ) and 
relative brightness (JQ) of a pupil. Therefore, the formula for the 
AQ is, in several adaptations, 


EA 
EE VORA! бшер Efe E] S 
40 99 а туйт WP aA? 

CA 


where EA, MA, and CA indicate respectively the educational, mental, 
and chronological ages of the pupil expressed in months and EQ and 
IQ designate respectively his relative educational development and 
brightness. 

For example, if a child has a mental age of ten years (120 months) 
and an educational age of nine years (108 months), his 

9-0 __ 108 _ 
AQ = го = 008 а= 90. 

If а pupil's achievement (EA) is in keeping with his ability to 
learn (MA), his AQ will be 100. Obviously, the indication of an AQ \ 
below roo should be that the child is not working to capacity and an 
AQ of more than roo should be impossible. However, a study of 
highly motivated instructional drives on certain content ** showed 
that an AQ of more than тоо is attainable. It is certain, however, 
that no one can achieve at more than тоо per cent of his capacity. 
Therefore, it would appear that such accomplishment quotients result 
from norms on achievement tests that are not high in reliability. 

There is evidence to show that higher accomplishment quotients 
are more frequently obtained in particular grade groups by the 
intellectually inferior than by the intellectually superior pupils.*7 
This probably is true largely because of the fact that the instruc- 


46 W, E. Lessenger, Motivation and the Accomplishment Quotient Technique. 
University of Iowa Studies in Education, Vol. III, No.2. University of Iowa, 
Towa City, 1925. 

47 Harl R. Douglass and С. L. Huffaker, “Correlation between Intelligence and 
Accomplishment Quotient.” Journal of Applied Psychology, 13:76-80; February 
1929. 
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tional levels of most schools are geared to the average and inferior 
pupils and that the curriculum frequently does not have enough 
“top” adequately to interest and motivate superior pupils. There- 
fore, an AQ below тоо may indicate poor effort, a high JQ, or both, 
and an AQ of more than тоо may indicate unusual effort, a low IQ, 
or both. 

Another weakness of the AQ is its low reliability,** which results 
from the fact that a ratio between two measures that are themselves 
not highly reliable for this comparison (EA and МА, or EQ and 
IQ) cannot be highly reliable because the quotient of two unreliable 
measures is less reliable than either of the measures. In defense of 
these ages and quotients, it should be said that they have satisfactory 
degrees of accuracy for their normal uses but that they may not be 
sufficiently reliable for use in the ratios for obtaining the AQ. 

A sound conclusion, growing out of the above and other more 
technical evaluations of the AQ, seems to be that its use with indi- 
vidual pupils is probably not justified but that it can satisfactorily 
be used for groups of pupils. , / | 


Index of studiousness 


An index of studiousness that attempts to relate ability to per- 
formance in the classroom has been proposed.*? In its simplest form 
this measure is the difference between a pupil's rank in his class on 
intelligence and on achievement as they are measured by standard- 
ized tests. The index of studiousness is practically limited in compar- 
able application to pupils within a class or instructional group and 
was recommended by its originator primarily for use in the high 
school. 


1O GENERAL PROCEDURES FOR INTELLIGENCE TESTING 


A large part of the continuing popularity of intelligence tests 
among teachers and supervisors may be traced to three main causes: 
(1) the tests themselves have been greatly improved in the accuracy 


48 J. Crosby Chapman, “The Unreliability of the Difference between Intelligence 
and Educational Ratings." Journal of Educational Psychology, 14:103-8; February 
1923. 

49 Percival M. Symonds, Measurement in Secondary Education. Macmillan Co., 
New York, 1928. p. 521-25. 
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and analytical value of the resulting measures; (2) a larger propor- 
tion of school officers have become intimately acquainted with intel- 
ligence tests and testing procedures, with a correspondingly greater 
appreciation of the functions they serve; and (3) the changes in 
modern conceptions of education and attitudes toward it have made 
the utilization of such devices almost essential. Therefore, intelli- 
gence, aptitude, and group-factor tests are important tools of en- 
lightened teaching procedure. 


Administering and scoring intelligence tests 


During the early years of the intelligence testing movement the 
classroom teacher was given little part in the testing procedures and 
frequently was even denied access to the results. However, as 
teachers have become more conversant with intelligence testing 
techniques and the use of results, they have been given more 
responsibility in the administration and scoring of the tests and in 
the interpretation and use of results of group intelligence tests. The 
administration of individual intelligence tests and performance tests 
should remain a responsibility of the psychologist rather than that 
of the classroom teacher. 

In many schools today, teachers administer, score, and quite often 
interpret the results of group intelligence tests. However, too great 
care cannot be taken by the teacher who participates in an intelli- 
gence testing program to understand the procedures for administering 
and scoring the tests and to follow the practices recommended by 
the test author, for it is only by such strict adherence to proper 
methods that reliability of the results is assured. 


Care in the use of intelligence test results 


On the whole intelligence tests seem secure in the place they now 
hold as indispensable supporting tools for achievement tests, as 
valuable instruments for the more exact classification of pupils, and 
as guides to the teacher in matters of pupil behavior and conduct 
and to the pupil himself in certain vocational and related matters. 
There are, however, a few dangers attached to their careless or 
indiscriminate use which the teacher and administrator should guard 
against. The more important of these dangers are probably social 
in their character. 
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In the first place, there is the danger that may arise through giving 
publicity to the results of intelligence testing. In the long run, nothing 
but damage will ordinarily be done by using intelligence test results 
for any other than purely school purposes, and then they should be 
used in strict confidence. A second danger in the careless use of such 
tests lies in the effect that knowledge of his own intelligence may 
have on the individual. The safest practice is to restrict informa- 
tion concerning results of intelligence testing to responsible school 
officers and teachers in the main, to make such information available 
to parents only in occasional and well-considered instances where 
need arises, and to withhold such information from pupils them- 
selves until they reach senior high school or perhaps even the college 
level. In no case does it seem justifiable to make intelligence quo- 
tients of individual pupils known to any persons other than their 
teachers and school officers, their parents, and themselves. 


11 VALUES AND USES OF DIFFERENT TYPES OF TESTS 


As tests of general intelligence, of aptitude and readiness, of group 
factors of intelligence, and of performance differ widely in type, 
mode of use, and nature of the resulting scores, it is inevitable that 
the situations in which they are most appropriately used must also 
differ. 


General intelligence tests 


There is wide use for the results of general intelligence tests in the 
classroom. Results from group tests must be interpreted cautiously, 
however, for these indirect measures of adaptability or of ability to 
learn are often not highly reliable. Results that may safely be used 
for group interpretations may well be too unreliable for individual 
pupil interpretations. The best safeguard is to administer group 
tests frequently—perhaps every two or three years—during the 
school career of the pupil and to judge pupil intelligence more in 
terms of average intelligence quotients than in terms of the results 
from any one test, even the most recent, alone. For pupils who have 
very low or very high /Qs and for pupils who are poorly adjusted to 
school, the administration of an individual intelligence test by a 
school psychologist is desirable. It is for the maladjusted, dull or 
borderline, and supericr pupils that group test results are most likely 
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to be unreliable. Furthermore, it is for pupils of these types that the 
attainment of optimum adjustment in the school is the most difficult. 
Hence, the significance of results from individual intelligence tests 
in such cases is great. 

Individual diagnosis. The intelligence test may prove especially 
valuable to the classroom teacher in assisting him to solve the prob- 
lems relating to the unusual child. The pupil may be unusually 
bright, troublesome, dull, or in some other way quite out of the 
ordinary. The teacher may wish to know whether this child's typical 
responses reveal his real general ability, and whether or not the 
judgments of his former teachers and supervisors are correct. Intelli- 
gence tests will give information not obtainable in any other way. 

The intelligence test, when given to an entire group, frequently 
uncovers a child of outstanding ability who has been content to go on 
with the group without revealing his real ability. Such tests invar- 
iably uncover cases of overlapping in ability just as achievement 
tests reveal cases of overlapping in school achievement. They may 
reveal children in the fourth and fifth grades with the mental ability 
of normal seventh- or eighth-grade pupils, or fourth- and fifth-grade 
pupils who in mental age are at the second- or third-grade level. 

When children are discovered who are mentally far in advance of 
their place in school, readjustments of work should be made to 
match their abilities. This may be accomplished by (т) advancing 
them to a grade where their intelligence is given a real test, (2) 
placing them in rapidly moving classes, so that they may progress 
according to ability rather than by some fixed promotion scheme, or 
(3) declaring minimal requirements for the entire class in the units 
of work to be done and then expecting the brighter pupils to attack 
the various problems at higher levels and more intensively than 
could be expected of the class as a whole. 

By proceeding along similar lines, the teacher will better under- 
stand the dull pupil because his difficulties can then be diagnosed 
more particularly and his strong points brought into relief. Quite 
frequently this results in an entire redirection and reorganization of 
his instruction. In any case, the intelligence test will assist both 
in explaining difficult cases and in revealing unsuspected general 
strengths and weaknesses. 

Educational guidance. The use of intelligence test results for 
educational guidance is similar to but goes far beyond their use for 
individual pupil diagnosis. Pupils can be much more effectively 
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advised in their selection of courses and of curricula, and, by the 
same token, courses can better be adapted to their needs, if informa- 
tion is available concerning their intellectual levels. Pupils may be 
better qualified for certain types of courses or curricula than for 
others, in terms of their levels of intelligence. Evidence now avail- 
able from some tests concerning ability levels in several areas or 
types of performance makes even more significant than formerly the 
possibilities of using intelligence test results in this manner. 

Vocational guidance. 'The dividing line between educational and 
vocational guidance cannot be clearly drawn, for the first merges 
gradually into the second. Whereas educational guidance is of pri- 
mary concern in the elementary school, even there it has its voca- 
tional implications. Vocational aspects of guidance assume an 
increasingly prominent position as the pupil progresses through 
junior and senior high school and in many instances nears the end 
of his school career. Although intelligence test results can be used 
with less confidence for vocational than for educational guidance, 
the information they furnish concerning the general intellectual 
abilities of pupils is of great value in vocational counseling. 

Class analysis and diagnosis. Viewed from the standpoint of the 
teacher, achievement tests and intelligence tests are supplementary 
devices. After the teacher has given achievement tests and compared 
his class with the norms in a given subject, he is still in danger of 
making false assumptions about the significance of these results 
unless he has available further information such as is furnished by 
intelligence tests. He may credit himself with an excellent job of 
teaching when the innate brilliance of the group in his charge is such 
that, if they were given really adequate instruction, much superior 
results might have been achieved. Or it may be that the class falls 
so far below the norm that the teacher may feel that his teaching has 
proven a failure. This may also be an unwarranted assumption, for 
the class may be considerably below average in intelligence and can- 
not be expected to approximate the norms that are set up for a class 
of average ability. It is evident, then, that there is a need for some 
means of determining approximately the intellectual ability of the 
class. Intelligence tests meet this need. By giving one or more such 
tests, the teacher can determine with a fair degree of accuracy 
whether his class is up to the normal expectation in ability to master 
schoolwork. 


INTELLIGENCE AND APTITUDE TESTS 273 


Specific intelligence tests 


Much of what has been said concerning the uses of general intelli- 
gence tests applies also to aptitude and readiness tests. However, the 
specific nature of these types of tests limits their significance to cer- 
tain uses that are correctly made of general intelligence tests. 

Aptitude tests are valuable for individual pupil diagnosis, educa- 
tional guidance, and vocational guidance. However, they have less 
significance for class analysis and diagnosis because they measure 
such specific abilities that individual pupil characteristics assume 
much greater importance than do characteristics of the class as a 
whole. Aptitude tests are primarily suited for use with pupils of the 
junior high school or higher levels, for the general and non-specialized 
type of course in the elementary school is less well-adapted to apti- 
tude testing than are the more specialized courses of the high school 
and the college. 

Readiness tests are useful for individual pupil diagnosis and for 
educational guidance but seem to have little significance for voca- 
tional guidance or class analysis and diagnosis. Such tests also have 
specific rather than general significance, so that the results from their 
use should be interpreted for the pupil as an individual rather than 
on the basis of the class group. 


Group-factor tests of intelligence 


The group-factor tests, lying perhaps midway between general 
intelligence and aptitude tests in specificity, have uses similar to 
those outlined above. As these instruments have been developed 
comparatively recently, scores resulting from their use should be 
interpreted with caution and primarily by educational and vocational 
counselors until their validities for various purposes have become 
well established. 

The bi-factor tests, most often supplying a verbal and a non- 
verbal score, doubtless distinguish two major factors of ability at 
any level from the intermediate grades to the college years. Boys 
regularly attain higher mean scores on the non-verbal sections than 
do girls, whereas the sex difference is typically in the opposite 
direction for verbal scores. Overlaps between the sexes are very 
great, however, so that many girls score far above the mean for boys 
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on the non-verbal tests and many boys surpass the mean for girls 
on the verbal tests by a wide margin. The diagnostic significance of 
these part scores seems sufficiently well established to warrant their 
cautious use in individual pupil diagnosis and for educational guid- 
ance when supporting evidence of other types is at hand. Their use 
in vocational guidance and for class diagnosis seems to be somewhat 
less appropriate. 

Multi-factor tests of primary mental abilities and differential 
aptitudes have not yet resulted in a clear-cut and generally accepted 
list of ability factors, nor have the validities of the various part 
scores been established to the point where their predictive signifi- 
cance is well known. Somewhat more widely available for the high- 
School and college than for the intermediate grade levels, their 
major uses appear to be in the areas of individual diagnosis and 
vocational and educational guidance. It seems desirable at the present 
time to use results from multi-factor tests for pupil guidance ouly 
in conjunction with other data of well-established validity. 


Performance tests of intelligence 


Performance tests are less frequently a tool of the classroom 
teacher than of the educational or vocational counselor. Pupils who 
have visual, language, or physical handicaps that preclude reliable 
testing of their abilities by group intelligence tests should be tested 
by individual intelligence scales or performance tests. The uses of 
results from performance tests do not differ significantly from the 
uses of group intelligence test results except that performance tests 
furnish less accurate measures of general intelligence than do group 
and individual intelligence tests and therefore should be employed 
with caution. 


Topics for Discussion 


т. Give several of the most meaningful definitions of intelligence. 
Which definition is most acceptable to you? Why? 

2. Briefly discuss and evaluate the three theories concerning the nature 
of intelligence that are presented in the chapter. 

3. Show how a high score on an intelligence test affords a basis for 
inferring the existence of a high degree of mental ability. 

4. What are the most appropriate uses for group intelligence tests? 
For individual intelligence tests? 
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Discuss the theoretical foundation upon which specific intelligence 
tests depend. 

Distinguish between aptitude tests and readiness tests. 

Upon what theoretical foundation do group-factor tests of intelli- 
gence rest? 

Distinguish between bi-factor and multi-factor tests of intelligence. 
For what purposes are performance tests ordinarily used? 

Discuss fully the most commonly used measures of mental maturity 
and brightness. 

То approximately what age does mental growth continue? 


2. Does a person's intelligence quotient remain constant throughout 


his life? Give evidence to support your answer. 

What would a culturally-fair intelligence test accomplish that stand- 
ard intelligence tests may not accomplish? 

Discuss the use of percentile scores and standard scores for the 
interpretation of intelligence test results. 

How is intelligence distributed among the population as a whole? 
What does the accomplishment quotient attempt to show? Discuss 
its defects and proper uses. 

If a child has a CA of 12-6 and an MA of 15-10, what is his IQ? 
If his EA is 14-6, what is his AQ? 

If the parents of a sixth-grade child who had a CA of 13-6 and an 
MA of то-т called upon you to discuss his poor work in arithmetic, 
what would you tell them about the causes of the child's deficiency 
in arithmetic? 

List and discuss some of the ways in which intelligence test results 
are useful in the classroom. 

Under what conditions, if any, do you think classroom teachers 
should be responsible for giving and scoring intelligence tests? 
Propose a program to be followed in a school for the recording and 
use of intelligence quotients or other derived scores of intelligence. 
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Using Personality Instruments 
and Techniques 


THE FOLLOWING aspects of personality and its measurement are dis- 
cussed in this chapter: 


Nature of personality. 

Methods of personality measurement. 

Nature and measurement of attitudes. 

Nature and measurement of interests. 

Significance and measurement of emotional adjustment. 
Measurement of total personality. 


BL BUDE S 


_ Teachers are expected to understand their pupils, and through this 
understanding to increase the efficiency of their teaching. To the 
teacher of some decades ago, all pupils were essentially alike. The 
modern teacher should have a knowledge of child psychology and 
the nature of individual differences in intelligence, achievement, and 
other important aspects of behavior. Many teachers give too little 
attention to the personality aspects of child behavior} preferring to 
work with the more readily observable and more tangible phases of 
behavior such as those treated in the chapters on achievement and 
intelligence testing. It is probable that teacher-education institutions 
even today too infrequently provide teachers in training with ade- 
quate instruction concerning pupil personality in a functional sense. 
Wherever the fault may lie, attention is increasingly being directed 
toward the more effective adjustment of the school to the needs of 
the child and of the child to life: Thus efficient teaching demands 
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more than a chance and casual acquaintance with personality testing 
techniques. 


| NATURE OF PERSONALITY 


Man has for centuries been aware of differences among individuals 
and has made many attempts to classify them. As early as 2000 B.C., 
Theophrastus divided men into thirty universal types," of which the 
dissimulator, the flatterer, the chatterer, and the rustic are represent- 
ative. Hippocrates, several centuries в.с., distinguished persons of 
the sanguine, choleric, melancholic, and phlegmatic characteristics 
and explained these various types of temperaments by excesses of 
the bodily fluids or “humors” he called blood, yellow bile, black 
bile, and phlegm respectively.” Palmistry, phrenology, numerology, 
and graphology have long made claims concerning their ability to 
diagnose personality. More recently, Kretschmer divided men by 
their physical characteristics into four types distinguishable by cer- 
tain general personality characteristics,* and Berman emphasized the 
influence of secretions from the endocrine or ductless glands upon 
personality.* Garrett referred to physiognomy, body type theories, 
and glandular theories as impressionistic and noted the contrast 
between their extravagant claims and the exceedingly meager results 
they produce.® 

Jung distinguished extrovertive and introvertive types of indi- 
viduals, and his classification has to a considerable degree found 
its way into popular usage. More recently still, however, psycholo- 
gists have increasingly turned their attention to the study of and 
attempts to measure personality. The concept of types evidenced 
in most of the early and many of the rather recent attempts to 
evaluate personality has largely been abandoned by modern per- 


1 Richard Aldington, editor, A Book of Characters from Theophrastus. E. Р. 


Dutton and Co., New York, 1924. 
2 Laurance Е. Shaffer, The Psychology of Adjustment. Houghton Mifflin Co., 


Boston, 1936. p. 284. 

3 Ernst Kretschmer, Physique and Character. Harcourt, Brace and Co., New 
York, 1925. 

а Louis Berman, The Glands Regulating Personality. Macmillan Co., New York, 
1921. 

5 Henry E. Garrett, Great. Experiments in Psychology, Third edition. Appleton- 
Century-Crofts, Inc., New York, 1951. p. 175-82. 

9 C. G. Jung, Psychological Types. (Translated by H. G. Baynes.) Harcourt, 


Brace and Co., New York, 1923. 
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sonality testers, for personality types are inconsistent with the 
"normal curve" distribution that has been found to apply to per- 
sonality traits as well as to intelligence and achievement. 

Personality was at one time thought to be largely if not entirely 
the result of biological inheritance. However, most authorities today 
prefer the view that it is the resultant of both hereditary and en- 
vironmental factors.*; Psychoanalysts believe that many of the per- 
sonality difficulties found among adults are caused primarily by 
experiences of early childhood, in many cases forgotten by the adult. If 
personality characteristics are the result in significant measure of 
the environment, which seems a justifiable conclusion, it is impor- 
tant for the teacher to be alert to the influence of the school in shap- 
ing the personality of the child as well as to its potentialities for 
correcting the maladjustments that pupils may have acquired prior 
to school entrance. 


Definitions of personality 


Personality is the most inclusive term that can be used in the dis- 
cussion of human behavior. Psychologists are not in complete agree- 
ment concerning the meaning of the term, but they recognize that 
personality describes more fundamental types of human behavior 
than the surface evidences by which the man on the street evaluates 
it. In general psychological definitions of personality explain what 
personality is in terms of the types of human behavior thought to 
contribute to it. Psychologists agree roughly upon these components 
of personality, but they usually resort to indirect methods of defin- 
ing the term. : 

Shaffer stated that the “personality traits of an individual are his 
persistent habits toward making certain types of adjustments rather 
than other kinds.” * Traxler considered the term to include the 
“sum total of an individual’s behavior in social situations.” ? 

These statements concerning personality seem to describe as well 
as possible in a non-technical manner what personality is. They 
perhaps represent the most meaningful view of personality for 


T Willard C. Olson, “Personality.” Encyclopedia of Educational Research, Revised 
edition. Macmillan Co., New York, 1950. р. 807. 

8 Shaffer, op. cit. p. 132. 

9 Arthur E. Traxler, Techniques of Guidance: Tests, Records, and Counseling in 
а Guidance Program. Harper and Brothers, New York, 1045. p. 100. 
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teachers and other persons who are not technical workers in the field 
of personality study. It should be kept clearly in mind that the 
behavior of the individual is controlled by his personality and at 
the same time furnishes the evidence by which his personality can 
in part be evaluated. 


Aspects of personality 


If personality is most satisfactorily described at present in terms 
of how it is constituted, it is understandable that approaches to 
personality study and measurement have been largely in terms of 
personality traits. Psychologists divide personality into many areas 
for study. However, the aspects of personality useful to the teacher 
can well be listed under fewer headings, although any classification 
must be largely arbitrary. The phases of personality treated in this 
chapter are grouped under the headings of attitudes, interests, emo- 
tional adjustment, and total personality. It is believed that these are 
the areas of greatest present significance to the teacher. 

Although the preceding statements include intellectual and physi- 
cal traits as components of personality, these traits are not generally 
considered when personality measurement is undertaken. They are 
measured by different techniques in established areas of testing. 
Consequently, although the psychology of personality rightly deals 
with their findings and no one should lose sight of the contributions 
of intellectual and physical traits to an individual's development, 
these areas are not of direct concern here. 


2 TECHNIQUES OF PERSONALITY MEASUREMENT 


Personality is measured by several different types of approaches. 
Among those most commonly used are (1) free association, (2) 
direct observation of behavior, (3) rating scales, and (4) personal 
reports. Although all of these methods can be used by an intelligent 
classroom teacher, it is probable that observation of behavior and 
personal reports are the methods most practicable and useful in the 
typical classroom. Each of these methods is discussed briefly in 
this section of the chapter. In the later sections various methods of 
measurement are discussed in terms of their uses for the evaluation 
of attitudes, interests, emotional adjustment, and total person- 


ality. 
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Personality testing is probably the newest area of measurement 
that bears directly upon the work of the classroom teacher. Although 
achievement and, to a less extent, intelligence, are subject to quanti- 
tative measurement and now have rather widely accepted technical 
terminologies to aid in the interpretation of testing results, such is 
not the case for personality testing. Results from personality tests 
are often difficult to interpret because the area has practically no 
derived scores, such as the educational age and the intelligence 
quotient, which have commonly accepted meanings. The effect of 
this situation is that personality test results must be interpreted 
largely in terms of the special types of derived scores and norms 
provided for the particular test or technique used and then fre- 
quently by qualitative rather than quantitative statements. 


Association methods 


An association method was one of the earliest to be employed 
in the measurement of behavior, for it apparently was first used by 
Galton as early as 1879.'? Its development has occurred mainly since 
1910 in the modern sense, however. 

Two association methods are now being quite widely used in the 
study of personality: (т) verbal association techniques, and (2) 
visual stimulus techniques. Although word association methods were 
known long before modern projective methods evolved, both types 
are now known as projective techniques. 

Verbal associations are established when the person to whom a 
word is spoken responds with the first word that enters his mind. 
Other free-association procedures are based on completions given 
to incomplete sentences and to partly told stories. 

The best-known visual stimulus methods are based on free re- 
sponses to inkblots and pictures. In these situations the subject 
is to respond by telling what he is reminded of or what he sees in 
each. Both the nature of the responses and the manner in which 
they are given furnish considerable evidence to the experienced 
psychologist on which to base inferences concerning emotional dis- 
turbances in the subject. ) 


1? Francis Galton, “Psychometric Experiments.” Brain, 2:149-62; July 1879. 
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Observational methods 


Several different methods based on the observation of individual 
pupil behavior have been suggested and successfully applied. They 
all probably require an ability that few teachers have but most can 
acquire. Untrained teachers make use of their own interpretations 
of events they observe, whereas objectivity is attained only by a 
rather rigid account of what actually occurred. The characteristics 
of objective observational methods highly useful in the study of 
pupil personality and adjustment make it inadvisable for inexperi- 
enced teachers to attempt to make more than experimental use of 
them until some experience in observation has been acquired. 

'The two common observational procedures most applicable in the 
school are (т) directed observation and (2) the anecdotal method. 
The first, because the observation is directed toward a particular 
pupil or pupil group under specified conditions, is a laboratory rather 
than a classroom procedure. The second, however, uses the results 
from observations of pupil behavior made at any time, and there- 
fore is definitely a classroom method of evaluation. 

Certain observational procedures are generally known as projec- 
tive methods. In using these, the child is presented with such mate- 
rials as sand, clay, toys, or paints and his use of the material 
presented is carefully observed by the psychologist. Much is revealed 
to the experienced observer concerning the conscious and even un- 
conscious motives, attitudes, interests, and needs of the individual 
by this approach. 


Rating scales 


Rating scales are widely used in the evaluation of pupil personality. 
In this procedure, the teacher or some other person intimately ac- 
quainted with the pupils rates them on personality traits in terms 
of the manner in which the individuals have impressed the rater. 
Obviously, the judges should know intimately the pupils they are 
rating. Most rating methods suffer in accuracy because some raters 
tend to be too lenient whereas others are too critical. They are less 
accurate for use with intangible traits, upon which observers usually 
vary rather widely in their evaluations, than in such readily observ- 
able characteristics as neatness and cleanliness. 

Widely used among rating techniques are the graphic rating scale 
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and variations of that form. In this procedure, the judge places a 
check mark at a certain position on a line to indicate his evaluation 
of the person he is rating. The line may be divided into five (or some 
other number of) sections designated superior, good, average, poor, 
and inferior, or meaning may be given to positions on the scale by 
other and more definitely descriptive terms. Again, there may be 
designations at occasional intervals beneath the line to indicate 
specifically for each trait varying evidences of its possession by the 
person being rated. Several personality rating scales are discussed 
later in this chapter. 


Personal reports 


(The personal report method makes use of what are variously 
called scales, inventories, questionnaires, and blanks. The responses 
are given, or the instruments are filled out, by the pupils them- 
selves. As many of the items on these instruments request highly 
personal responses, the personal report method of measuring person- 
ality suffers from the fact that pupils sometimes reply as they think 
they should reply rather than as they truly react to the various items. 
Most persons are hesitant in revealing their inner personalities to 
other persons freely. In fact, the customs of civilized Society place 
something of a premium upon the ability to hide or disguise emo- 
tions, likes and dislikes, attitudes, and other reactions in many 
situations. Therefore, it is not surprising that pupils sometimes fail 
to answer personality inventory items truthfully. Despite this major 
weakness, personal report instruments for the measurement of per- 
sonality are of considerable value in their classroom uses. 


5 MEASUREMENT OF ATTITUDES 


A significant portion of the teacher's time in the classroom is 
directly or indirectly devoted to the development in pupils of 
desirable social attitudes and modes of behavior. Illustrations are the 
emphasis in the school upon good citizenship, cooperation with 
others, intellectual honesty, and the scientific attitude. Furthermore, 
many courses in the school attempt to develop attitudes that are in 
many cases more specific than those mentioned above. For example, 
the teacher of English attempts to develop favorable attitudes to- 
ward correct usage and good literature in his pupils, and the teacher 
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of civics to develop democratic ideals and belief in democratic 
institutions. Lists of course objectives include many such attitudes, 
ideals, or beliefs which the school strives to develop or improve in 
the pupil. Tt seems important, then, that teachers be conversant with 
instruments for measuring attitudes of various types. 


The nature of attitudes 


Thurstone and Chave defined an attitude as “the sum total of a 
man's inclinations and feelings, prejudice or bias, preconceived 
notions, ideas, fears, threats and convictions about any specific 
topic.” 11 An attitude is a state of readiness that exerts a directive, 
and sometimes a compulsive, influence upon an individual's behavior. 

Attitudes may be either general or specific. For example, a person 
who has a general attitude of liberalism may behave in a highly 
conservative manner in a particular situation in which his personal 
welfare may be threatened. An attitude of conservatism is general, 
but an attitude toward a certain person is specific. This brief indica- 
tion of the nature of attitudes will furnish the student sufficient back- 
ground concerning the psychological characteristics of attitudes for 
the brief consideration of measuring instruments presented in this 
chapter. 


Methods of attitudes measurement 


Attitudes are measured by several different methods, among the 
most common of which are the scale or questionnaire and the inter- 
view. As the teacher ordinarily has more use for the attitudes scale 
than for the interview, only brief mention will be made of the inter- 
view as a device for determining attitudes and opinions. 

Attitudes scales. The two series of attitudes scales most widely 
used and known are the Thurstone Scales for the Measurement of 
Social Attitudes and the Generalized Attitudes Scales devised by 
Remmers and his associates. The former measure attitudes toward 
areas where differences of opinion exist, such as censorship, immigra- 
tion, unions, war, capital punishment, and the movies, whereas the 
latter measure such attitudes as those toward any social institution, 
any racial or national group, any vocation, and any school subject. 


11L, L. Thurstone and E. J. Chave, The Measurement of Attitude. University 
of Chicago Press, Chicago, 1929. p. 6-7. И 
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Sample items from the Around the World attitudes inventory for 
pupils from the sixth to the tenth grade are presented in an accom- 
panying illustration. This inventory measures attitudes toward vari- 
ous phases of international relations, war, patriotism, and agencies 
for peace. 


Sample items from "Around the World" Attitudes Inventory +° 
1. vES OR NO? 


Below are a number of statements, some of which are true and some are 
false. Underline YES if you think a statement is true. If you think it 
is not true, underline NO. 


т. Most foreigners are less intelligent than Americans .. YES NO 
In most American homes there are things made out of 
material that came from other countries .......... YES NO 


ro. In modern warfare people who live far from the fight- 
ing are often in great danger 


Another illustration of an attitudes measurement technique is that 
of the Health Attitudes Inventory. This is one of the six health in- 
* ventories developed by the Cooperative Study in General Education. 


Excerpt from Health Attitudes Inventory :? 


Directions: Read each of the following statements and on the appropriate line of the answer sheet blacken 
the space under 


A if you agree with the whole statement, 
D if you disagree with the whole statement, 
U if you are uncertain how you feel about the whole statement. 


STATEMENTS 
1. People lose their hair because they do not take care of it in a way which is known to be right. 


2. Before shaking hands with a person, one should if possible make sure that he does not have any 
skín disorders. 


3. If two people maintain that the use of certain foods cause different effects on their skins, one or 
the other or both probably made incorrect observations. 


4. Ordinarily there is little danger of catching a disease by wearing new clothes before they are 
laundered. 


12 Adelaide T. Case and Paul M. Limbert, Around the World. Published by 
Association Press, 1932. 

13 Cooperative Study in General Education, Health Attitudes, Health Inventory 
No. IV. Published by Educational Testing Service, 1950. 
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The interview. The method of determining attitudes by the use 
of the interview is similar to the method of general interviewing dis- 
cussed in Chapter 9. However, the interview for purposes of attitudes 
measurement is usually restricted to rather direct and somewhat 
standardized questioning on the particular issue toward which at- 
titudes are being measured. 


Á MEASUREMENT OF INTERESTS 


The attention of classroom teachers has increasingly turned of 
late years to pupil interests, as a result of the emphasis now placed 
upon the adaptation of the school offerings to the abilities, needs, 
and interests of pupils. Furthermore, the vocational and avocational 
interests of children have increasingly received attention during the 
last few decades as a means of aiding the pupil in the selection of his 
school courses and curricula and his life vocation. The school ob- 
viously cannot adapt its offerings to pupil interests and guide pupils 
in their selection of courses in terms of interests if the nature of those 
interests is unknown. 


The nature of interests 


Interests today are most often classified in terms of the objects 
and activities from which the individual obtains satisfaction.1* Thus, 
a person is interested in football but cares very little for tennis, or 
he is interested in music but is not interested in the drama. It is 
in this non-technical manner of considering interests that measure- 
ment in this field can be most meaningful for the teacher. 


Methods of interests measurement 


Interests are subject to measurement both by standardized inter- 
ests inventories and by informal methods, Brief mention will be 
made here of informal testing methods. Interests inventories will be 
treated somewhat more fully. 

Interests inventories. It is perhaps because of the fact that inven- 
tories of pupil interests in objects and activities are so easy to make 
that standardized inventories most often deal with interests from 


14 Douglas Fryer, The Measurement of Interests. Henry Holt and Co., New 
York, 1931. p. 15. 
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the standpoint of their predictive or diagnostic values for important 
types of behavior./The result is that certain types of instruments 
perhaps best classified as inventories do not differ greatly from at- 
titudes scales and that other types are very similar to, or in effect 
may be, adjustment inventories. 

The Pressey Interest-Attitude Test, a brief excerpt from which 
appears on page 6o, is an illustration of the first type. Another illus- 
tration in a non-vocational interest area is the Health Interests 
Inventory, from which an excerpt appears in an accompanying illus- 
tration. A comparison of this and the excerpt from the Health 
Attitudes Inventory on page 286 may help to clarify the minor 
distinction between attitudes scales and general interests inventories. 


Excerpt from Health Interests Inventory +5 


Directions: Read each of the questions below and on the appropriate line of the answer sheet blacken 
the space under 

А if the question interests you and you feel it should be dealt with in school, 

B if the question interests you but you feel it should not be dealt with in school, 

C if the question does not interest you. 


1. Do certain diseases of the skin result from beauty-parlor or barbershop treatments? 
2. Are pimples caused by poor digestion? 

3. If the skin is dry and becomes itchy after bathing, what should be done? 

4. What is the proper treatment for boils? 


15. How can athlete's foot be cured? 


Probably the best-known measuring instrument in the field is 
the Strong Vocational Interest Blank. This inventory is not intended 
for use below the senior high school and college levels because of 
the transient nature of interests in vocations at lower age levels. In 
common with many other interests inventories, the Strong blank has 
separate forms for young men and young women in order to provide 
for the types of sex differences usually found to exist in interests. 

Persons taking the Strong blank are asked to respond to items 
dealing with the following: (т) occupations, (2) school subjects, 
(3) amusements, (4) activities, (5) peculiarities of people, (6) order 
of preference of activities, (7) comparison of interest between two 


E Cooperative Study in General Education, Health Interests, Health Inventory 
No. III. Published by Educational Testing Service, 1950. 
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items, and (8) rating of present abilities and characteristics. They 
designate their interests on a three-point scale for most of the items 
to indicate their degree of liking. The samples of the accompanying 
illustration show the nature of differences in the men's and women's 
occupations items, the identical nature of items on present abilities 
and characteristics in the men's and women's forms, and two of the 
methods of responding to the items. 

Scoring of the instrument requires quite lengthy procedures in- 
volving the use of varying positive, zero, and negative weights differ- 
ing for the same response according to the particular vocation for 
which the blank is being scored. The men's form can be scored for 
35 different occupations, as well as for several occupational groups 
and two special occupational indices.* The women’s form can be 
scored for 16 occupations and one special occupational index.'? 

Informal measurement of interests. Educational literature of 
recent years includes many reports of interests studies in a variety 
of school subjects and areas of behavior. Among the fields for which 
such studies have appeared in considerable number are reading 
interests in books, magazines, and newspapers; play interests; 
interests in the movies, television, and radio; and interests in various 
subjects of the elementary school and high school. Reference to such 
sources will furnish the teacher much information concerning inter- 
ests of various pupil groups. 

However, the teacher can obtain direct information concerning 
the interests of his pupils by informal methods. Questioning indi- 
vidual pupils and class groups about their interests is a simple 
procedure, and one productive of considerable information. The 
teacher may, however, have the pupils write about their interests or 
list them without discussion. Again, he may prepare and distribute 
to the pupils a list of books, of magazines, of recreational activities, 
or of any one of a number of other types of objects and activities and 
then ask the pupils to check those in which they are interested. In 
any of these procedures, it is wise to limit the investigation of 
interests to one area rather than to attempt a complete inventory of 
pupil interests at one time. 


16 Edward К. Strong, Jr., Manual for Vocational Interest Blank for Men. Stan- 
ford University Press, Stanford University, Cal., August 1938. 

1f Edward К. Strong, Jr., Manual for Vocational Interest Blank for Women. 
Stanford University Press, Stanford University, Cal., October 1938. 
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5 MEASUREMENT OF EMOTIONAL ADJUSTMENT 


Every individual faces the problem of adjusting himself to a none- 
too-benign environment. Persons who are successful in adapting 
themselves to their environments are well adjusted; those who fail 
in this adaptation become maladjusted. The school seeks to im- 
prove the adjustment of its pupils by furnishing them important 
learning opportunities and experiences. However, it must go beyond 
learning in the classroom sense and attempt to bring about the best 
possible form of adjustment between the individual and his environ- 
ment in terms of his total personality. 

The measurement of adjustment is an extremely comprehensive 
task. In its broad sense such measurement implies the use of all 
types of devices that will furnish information concerning the child 
and his backgrounds of heredity and environment. The discussion of 
adjustment in this section applies primarily to emotional adjust- 
ment. Although this is a fundamentally important issue, because of 
the fact that maladjustment seems to have consequences of great 
importance in the emotional life of the individual, the measurement 
of emotional maladjustment should not be regarded as the sole 
approach to this problem. The discussion in certain portions of 
Chapter 9 deals with adjustment in a somewhat broader sense than 
the treatment given in this section. 


Causes and symptoms of maladjustment 


Maladjustment may arise when an individual is frustrated in the 
satisfaction of his fundamentally important aims, motives, or goals. 
It is the result of a lack of balance between the difficulties the indi- 
vidual encounters in his environment and his ability to meet the 
difficulties successfully. The underlying causes may be of many 
types, and frequently they are very elusive. Frustration itself is a 
result, not a cause. The effects, or results, are much more readily 
determined than are the causes. Symptoms of maladjustment may 
fairly readily be observed by the teacher who has insight into pupil 
behavior, but the determination of causes underlying maladjustment 
is often a task for the clinical psychologist. Although some alleviation 
of maladjustment may be accomplished without knowledge of its 
causes, effective remediation depends upon a knowledge of and 
ability to cope successfully with the true causal factors. 
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Methods of adjustment measurement 


The importance of an awareness by the teacher of existent emo- 
tional maladjustments in his pupils should be apparent from the 
preceding discussion. Such recognition of maladjustments should be 
accompanied by evidence concerning their nature, and, if possible, 
their causes. Adjustment inventories serve the first two purposes of 
pointing out the existence of and nature of existing maladjustments 
quite adequately in many instances, but they probably do not ac- 
complish the third purpose, of discovering the causes of maladjust- 
ments. They frequently, however, furnish evidence that will greatly 
facilitate further study of maladjusted pupils in the attempt to 
determine causes and then to eliminate them. 

Three general procedures are probably most often used in the 
measurement of adjustment—personal report blanks, rating scales, 
and projective techniques. Each of these methods is discussed 
briefly and illustrated by a few representative instruments in the 
following pages. 


From Rogers Test of Personality Adjustment 1° 


Suppose that just by wishing you could change yourself into any 
sort of person. Which of these people would you wish to be? Write a 
“т” in front of your first choice, a “2” in front of your second choice, 
and a “з” in front of your third choice: 


(a)— — — a housewife (n)_______a fireman 
(b). — aA teacher (а) 8 1. 3 poet 
(h).... . a business woman (a ne" даб фев 
(1).—— — an aviator (y) — — а salesman 
(m). a captain (2) ЕНЕ Ап агг 


Is there any other sort of person you would like to be? If there is, 
write it here: 


| Personal report blanks. By far the majority of adjustment inven- 
tories make use of the personal report method, by which pupils are 
asked to give answers to a variety of questions) The considerable 
quantity of adjustment inventories and the wide variety of response 
methods they use precludes any more comprehensive treatment 
here than brief descriptions and illustrations of a few of them. 


19 Carl R. Rogers, A Test of Personality Adjustment for Girls. Published by 
Association Press, 1931. 
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An illustration from the Bell Adjustment Inventory, an instrument 
for measuring (r) home, (2) health, (3) social, and (4) emotional 
adjustment, was given on page 6o. It will not be discussed further 
here. Sample items from the girls’ form of the Rogers Test of Per- 
sonality Adjustment are given herewith to illustrate procedures used 
in measuring the adjustment of elementary-school children. This 
inventory, for use with girls from nine to thirteen years old, is devised 
to measure adjustment of the girl toward other children, toward her 
family, and toward herself. The comparable form for boys is not 
illustrated here. 

The Aspects of Personality inventory measures the temperament 
and personality traits of children in Grades 4 to 9 by the use of items 
of the type shown in the accompanying illustration. The inventory 
yields scores that can be translated into percentiles on an ascendance- 
submission, an extroversion-introversion, and an emotionality scale. 


Excerpt from Aspects of Personality *° 


‘SECTION III ш 
1. I like to-go to the movies.. . NN Oe ELO UTE [5] [p]: 
2. I think most children like to make fun of ше........... 2] 2 
3. I get angry about nothing... -s.es оет [р] з 


4. I get so angry I can't talk. 


The Guess Who Test, illustrated by the sample given below, is 
intended for use in measuring a child's reputation among his fellows. 
The test, for use from Grade 5 to Grade 8, requests pupils to list their 
classmates who particularly fit the brief portraits presented to them. 
It is possible to obtain a total reputation score for each pupil in a 


class from the results. 
Excerpt from Guess Who Test ** 


Here are some little word-pictures of children you may know. Read 
each statement carefully and see if you can guess who it is about. It 
might be about yourself. There may be more than one picture for the 
same person. Several boys and girls may fit one picture. Read each 
statement, Think over your classmates and write after each statement 


20 Rudolf Pintner and others, Aspects of Personality. Published by World Book 
Co., 1937. SUN 
21 Guess Who Test. Published by Association Press, 1930. 
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the names of any boys or girls who may fit it. If the picture does not 
seem to fit anyone in your class, put down no names but go on to the 
next statement. Work carefully and use your judgment. 


r. Here is the class athlete. He (or she) can play baseball, basketball, 
tennis, can swim as well as any, and is a good sport. 


The Mooney Problem Check List, from which an excerpt is shown 
in an accompanying illustration, differs from many personal report 
forms in that it makes no provision for formal pupil scores and no 
norms are provided. Since its major uses are in counseling, surveying 
pupil problems, and research, the use of indicated problems, simple 
counts of problems by areas, and summaries of problems for groups 
of pupils constitute the recommended bases for interpretation of 
results. Local norms are considered to be of greater significance than 
national norms. Therefore, it is suggested by the publisher that they 
be derived as desired. 


Excerpt from Mooney Problem Check List 2° 


DIRECTIONS: Read the list slowly, and as you come to a problem which 
troubles you, draw a line under it. 


1. Often have headaches 36. Too short for my age 

2. Don't get enough sleep 37. Too tall for my age 

3. Have trouble with my teeth 38. Having poor posture 

4. Not as healthy as I should be 39, Poor complexion or skin trouble 
5. Not getting outdoors enough 40. Not good looking 

6. Getting low grades in school 41. Afraid of failing in school work 

7. Afraid of tests 42. Trouble with arithmetic 

8. Being a grade behind in school 43. Trouble with spelling or grammar 
9. Don't like to study ^4. Slow їп reading 

10. Not interested in books 45. Trouble with writing 


Rating scales. Two rating scales that are of major use in locating 
maladjusted pupils are briefly commented upon and illustrated here. 
Although these scales have the same general purposes as the personal 
report blanks discussed above, the two types of adjustment measures 
differ greatly in method. 

The Haggerty-Olson-Wickman Behavior Rating Schedules are il- 
lustrated by the few following items Although this scale is similar 


22 Ross L. Mooney, Mooney Problem Check List, Junior High School Form, 1950 
revision. Copyright by Psychological Corporation, тозо. 
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in general appearance to a graphic rating scale, it differs in that the 
two extremes do not necessarily represent the most and the least 
desirable situations. Instead, the numbers 1 to s, variously spaced 


Excerpt from Haggerty-Olson-Wickman Behavior Rating Schedules ** 


Scor 
25. Is he even-tempered or moody? 4 
UTE = 3c ST T 
Stolid, Generally Is happy or Strong and Has periods of 
Rare changes very even- depressed as frequent changes extreme elations — 
of mood tempered conditions warrant of mood > or depressions 
(3) (1) Q) (4) 
26. Is he easily discouraged or is he persistent? 
T кєл. С ET 5 : 
Melts before Gives up before Gives Persists until Never — 
slight obstacles adequate everythin, convinced of gives in, 
or objections trial а fair tri: mistake Obstinate 
(5) (3) (1) (2) (4) 
27. Is he generally depressed or cheerful? 
ra Ө STE т 17727: ae 
елей, Generally Usually in Cheerful, Hilarious — 
Melancholic, dispirited жоо humor Animated, 
In the dumps Chirping 
(3) [©] (1) (2) (5) 


for different items, indicate in descending order the relative de- 
sirability of the stated condition. ; 

Another type of instrument that is in effect a rating scale is the 
Baker “Telling What I Do" tests. The accompanying illustration 
from the advanced level test for pupils in Grades 7 to 9 illustrates 


Excerpt from Baker "Telling What | Do" Test ** 


On this sheet you will find many things about yourself. Some of these things are known about you al« 
ready, but we want you to tell us yourself. 

Each exercise has three answers. You are to draw-a line under the one answer tc each exercise that 
most nearly tells what you do. = Put the letter of the answér in the parenthesis at.the end of the line. 

Underline only one answer to each exercise. "Take the one that most nearly fits you. Ве honest with 
yourself. Underline what you really do, even if it is not what you know you should do. Ў 

‘There are eighty exercises. Answer all of them. "Také your time, and think over each exercise care- 


fully. It should take you at least half an hour, or longer, to ‘do all the exercises as you really should.: y 


1, Tardy for school 


а. Nevertardy b. Often tardy с. Tardy once ina while ( ) 
А Mer pero E b. Don't care if I lose c. Try harder next time ( ) 
i Sealy hurry b. : Eat very fast c. Eat slowly Ge) 
$ s pases b. Don't care about them €. They bore me v y 
3 ^ pes pay back b. Pay back right away c. Pay when asked C» 


23 M. E. Haggerty, W. C. Olson, and E. К. Wickman, Haggerty-Olson-Wickman 


Behavior Rating Schedules. Published by World Book Co., 1930. 
24 Harry J. Baker, “Telling What I Do,” Advanced Form. Published by Public 


School Publishing Co., 1930. 
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the method of measuring pupil behavior. Scores can be obtained for 
the following areas of behavior: (1) school, (2) home, (3) play, 
(4) social, and (5) ethical-moral. 

The Hayes Scale for Evaluating the School Behavior of Pupils 
Ten to Fifteen is another rating scale used either by the teacher in 
rating his pupils or by the pupils in obtaining self-ratings. Directions 
for using the scale and a few sample items are given in the accom- 
panying illustration. The results of a simple scoring procedure can 
be used in preparing a school behavior profile, in locating maladjusted 
pupils, and in spotting the behavior areas in which maladjustment 
seems to exist. 


Excerpt from Hayes Scale for Evaluating School Behavior *° 
Directions for Using this Scale 


Following is а list of habits which children 10 to 15 years old have been found to 
show. No one child could have all the habits listed, but is certain to have.a consid- 
erable number of them. 


Draw a circle around the T, F or U before each item to indicate: (T) you 
belieye the statement is true of the child being rated; (F) you believe the statement 
is not true of the child being rated; (U) you are uncertain whether the statement 
is true or not true of the child being rated. Be sure to draw a circle about one letter 
and one only for every item in the list. Two samples are given below: 


(T) F U usually accepts responsibility when the occasion arises 
T (F) U often wastes time 


Circle the following items in a similar manner 
I 


T F О 1. often does little things to make others happy 

T F U 2 usually thinks of consequerices both to self and others 
T F U 3 usually accepts responsibility when the occasion arises 
T F U 4 often shares with others 

T F U 5 usually does his share in any group activity 

T F U 6 often "plays hookey" from school 

T F U 7. usually does the work expected of him 

T F U 8 usually defends his friends only when they are in the right 
T F U 9. usually makes friends easily 

T F U 10. often starts fights 

T F U 1L usually quickly forgives wrongs done to him 

T F U 12. often uses vulgar or profane words 

T F U 13 usually eats lunch with a group 


25 Margaret Hayes, A Scale for Evaluating the School Behavior of Children Ten 
to Fifteen. Published by Psychological Corporation, 1933. 
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6 EVALUATIVE TECHNIQUES 


The major instruments discussed above for use in the measure- 
ment of personality are paper-and-pencil instruments. They have 
come to be known as structured inventories because persons filling 
them out respond to the content of the instruments. In contrast 
are the unstructured techniques for evaluating personality. Although 
the various projective methods are most commonly referred to as 
unstructured, certain other evaluation techniques also permit free 
responses by the pupil. The unstructured techniques differ from 
the structured inventories in their direct concern with overt behavior 
of the whole child rather than with verbalized responses to specific 
situations. The two types of unstructured techniques dealt with 
below are used in the evaluation of individual pupils and the dy- 
namics of group behavior. 


Evaluation of individual behavior 


Three evaluative techniques useful in the study of individual pupils 
are considered briefly below. The anecdotal record and the case 
study are appropriately used by classroom teachers, but projective 
techniques should be employed only by psychological examiners, 
school psychologists, clinical psychologists, or other persons with 
technical training in their use.) 

Anecdotal records. Teachers have doubtless for generations used 
the anecdotal method in their spare-time discussions about pupils. 
However, its first use as an evaluative instrument was probably as 
recent as 1928.?*/The anecdotal record is an objective description by 
the teacher of a significant occurrence or episode in the life of the 
pupil. Unless a situation has sufficient meaning to a teacher who is 
alert to the underlying motives governing human behavior to bring 
it definitely to his attention, it probably is not of sufficient signifi- 
cance for inclusion in the anecdotal record. 

An anecdotal record must be carefully, although not laboriously, 
prepared if it is to be of value. The anecdote is a highly objective 
brief of what occurred in a situation in which a pupil behaved in a 
sufficiently unusual manner to make the incident meaningful. It 


26 D, A. Robertson, chairman, “Report of Subcommittee on Personality Measure- 
ment.” Educational Record, 9:53-68, Supplement No. 8; July 1928. 
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may consist of an objective narrative of the incident only or it may 
consist of the narrative, an impartial interpretation of the occur- 
rence, and, as a possible third stage, even a recommendation for 
guidance of the pupil concerned. If interpretations and recommenda- 
tions are given, however, they should be distinguished from the 
original description so that their nature is clearly apparent to a 
person reading the anecdotal record. The anecdotal record has great 
value only when it is made cumulative by the addition of new 
anecdotes as meaningful situations arise and are observed and re- 
corded by the teacher or some other school officer. 

Case study. The case study is a broad and comprehensive ap- 
proach to the problems of pupil behavior. It should include exten- 
sive information about the present status of the pupil as well as about 
his past experiences and his family background. In fact, the case 
study may well draw upon many or even all of the types of informa- 
tion contained in adequate cumulative pupil records. 

Usually there is a specific reason for making a case study. Such an 
approach may be used to gain a better understanding of a failing 
pupil, or a pupil who is poorly adjusted in one or another of many 
possible ways. 

Projective techniques. A simple characterization of projective 
techniques is that they attempt to induce the child to reveal his 
personality through his free responses to situations that can be 
observed by the psychologist) Bell stated that the “purpose of pro- 
jective techniques is to gain insight into the individual personality,” 
and that their method is “to reveal the total personality, or aspects 
of the personality in their framework of the whole.” 27 Techniques 
classified as projective differ widely in the materials used, the meth- 
ods of presentation to the pupil, and the methods of interpreting 
the pupil’s behavior, but all are intended to bring forth behavior 
representative of the inner personality and to permit the psychologist 
to draw inferences concerning intrinsic motives. 

Probably most widely used in this area are the Rorschach test and 
the Thematic Apperception Test. The Rorschach makes use of pupil 
interpretations of inkblots and the TAT employs a wide variety of 
pictures as the basis for pupil responses) The Szondi Test employs 
photographs of persons and uses pupil асаа of likes апа dis- 
likes as the basis for analysis: Some of the other projective techniques 


27 John E. Bell, Projective Techniques: A Dynamic Approach to the Study of 
the Personality. Longmans, Green and Co., New York, 1948. p. 4. 
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involve drawing or painting, play, handwriting, the completion of 
pictures, and dramatic productions. Sims stated that the essay 
examination is a projective technique under some conditions.?* 

As only a trained psychologist should attempt to employ these 
projective techniques, this brief discussion is intended to familiarize 
the student with the general nature of a few of the most widely used 
projective methods. Teachers occasionally may encounter situations 
in which maladjusted pupils are studied by the use of these tech- 
niques, so they should be sufficiently familiar with the general pro- 
cedures involved to be intelligent users in subsequent pupil guidance 
of the interpretations made by the examining psychologists. 


Evaluation of group dynamics 


Two methods are now used quite widely in studying the behavior 
of the whole child in settings involving interactions among members 
of social groups. Information is thereby obtained concerning the 
place of the individual within the group and concerning group 
behavior as influenced by the contributions of the individual mem- 
bers. The methods briefly discussed below involve the use of the 
sociogram and of analyses of group interactions. Both have been 
influenced considerably by developments in the field of sociology. 

The sociogram. When groups of individuals are thrown together, 
as in a grade group of pupils in the elementary school or a homeroom 
group or class in the high school, some type of social relationship 
inevitably exists between each pupil and every other pupil indi- 
vidually. The possible range of relationships is from that involved 
in very close friendships to that of rejection. However, the variety 
of social situations is so wide that a pupil who is rejected by another 
in a particular social situation may be sought out in another, and 
quite different, social framework. For example, a boy preferred by a 
certain teammate as captain of the football team might be rejected 
by the same teammate as a member of a debating team. ) 

The sociometric method which leads to the production of a 
sociogram as the end product is quite simple to apply. Most typically 
each pupil in the group is asked to name his first, second, and perhaps 
third choices among other members of the group in several signifi- 
cant and pertinent types of social settings. Questions asking for the 


28 Verner M. Sims, “The Essay Examination Is a Projective Technique.” Educa- 
tional and Psychological Measurement, 8:15-31; Spring 1948. ` 


зоо THE SECONDARY SCHOOL 


expression of individual preferences for class president, the occupant 
of an adjacent seat in the homeroom, or a member of a committee 
merely illustrate the wide range of possibilities. 

The sociogram is used to represent the results graphically. Prior 
to its preparation, the results in response to a certain question are 
analyzed by any one of several methods, ranging from the use of a 
tally sheet to the employment of cards or slips of paper that can be 
sorted.?? When these results, showing first, second, and third choices, 
have been organized and the pupils ranked from highest to lowest 
in frequencies of choice, the sociogram can be constructed. Usually 
the pupils most often chosen are represented near the center of the 
sociogram and those rejected or least often chosen near the margins. 
Mutual choices, i.e., pupils choosing each other, should be repre- 
sented by closely adjacent figures. The lines showing choices should 
be as short as reasonably possible and intercrossing of lines should be 
kept at a minimum. First, second, and third choices should be desig- 
nated by numbers or by different types of lines. Boys are often 
distinguished from girls by the use of figures such as circles and 
triangles. The symbols should contain.pupils! names or initials for 
ready identification. The application of these principles is shown in 
Figure 21, which represents the social interactions of a group of 
elementary-school pupils. 

The preparation and subsequent study of several related socio- 
grams for a class or homeroom group should add materially to the 
teacher's understanding of social relationships among his pupils and 
enable him to use the results in significant ways to take into account 
the relationships found and to take remedial action where conditions 
warrant. 

Direct observation. 'The interactions of the individual members 
of a small group working on a common problem can be evaluated by 
the use of observational techniques. Groups should be small, say not 
larger than twenty, and should be working on cooperative projects 
in which interaction rather than individual work is entailed.?? Al- 
though interaction process analysis is not a distinctly new technique, 
its methodology has been improved materially during the last ten 
years. 


29 Helen Н. Jennings, Sociometry in Group Relations: A Work Guide for Teach- 
ers. American Council on Education, Washington, D. C., 1948. p. 17-21. 

30 Robert F. Bales, Interaction Process Analysis: A Method for the Study of 
Small Groups. Addison-Wesley Press, Inc., Cambridge, Mass., тозо. р. i. 


PERSONALITY INSTRUMENTS AND TECHNIQUES 301 


e 
Sloton 


TA —————— 'Á'—/-'^PÓ'ÓÁua (iei dein 


LEGEND 


Nole.—For an absent boy or girl, use the respective symbol dashed, 
rw GIRL ме leaving any choice line open-ended (see the case.of Joe Brown in the 
above sociogram). 


1f rejections are obtained, the choice line may be made in dashes or in 


-o X H а differerit color. 
Obe= way choice Whenever a direct line from chooser to chosén cannot be drawn without 
—t—- Mutual choice crossing through the symbol for another individual, the line should be 


drawn with an elbow, as in the case of Bill Lane to Paula King, . 


4,2, or 3 = order of choice 
Fig. 21. Sample*sociogram ** 
The observer, in studying group interactions, records in sequence 
the actions of the group members and classifies each item of observed 


behavior by interpreting it as belonging to one or another of the 
behavior types decided upon in advance as significant. An illustration 


31! Jennings, op. cit. p. 22. 
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of behavior categories used in such situations appears in Figure 22. 
An interaction profile making use of these behavior categories and 
various other devices is used in summarizing the results and afford- 
ing a basis for interpretation. 


Shows solidarity, raises other's status, 


1 gives help, reward: 
Social- д A 
E Shows tension release, jokes, laughs, 
Emotional д 2 Shows satisfaction: М ү = 
Атеа: 
Positive 3 Agrees, shows passive acceptance, un- 
derstands, concurs, complies: 
4 Gives SURES SUCH direction, implying 
autonomy for other: 
5 Gives opinion, evaluation, analysis, 
B4 expresses feeling, wish: 
6 Gives orientation, information, repeats, 
Task clarifies, confirms: b d { 
Area: E 2 © Е 
Neutral 1 Asks for orientation, information, 
repetition, confirmation: 
Asks for opinion, evaluation, analy- 
c sis, expression of feeling: 
9 Asks for suggestion, direction, pos- 
sible ways of action: 
1 Disagrees, shows passive rejection, 
0 formality, withholds help: 
Social- 
Emotional Shows tension, asks for help, with- 
Area: D draws out of field: 
Negative Г 
12 Shows antagonism, deflates other's 
Status, defends or asserts self: 


KEY: 


Problems of Communication 
Problems of Evaluation 
Problems of Control 

Problems of Decision 
Problems of Tension Reduction 
Problems of Reintegration 


Positive Reactions 
Attempted Answers 
Questions 

Negative Reactions 


gaw» hoang 


Fig. 22. Behavioral categories and their major relations ?? 


3? Bales, оў. cit. p. 9. 
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As is true for projective methods, the teacher should not attempt 
to make direct use of these techniques for studying group interac- 
tions. However, sufficient insight into the purposes and general 
methods involved is probably given by the list of behavior categories 
to provide the teacher with a basis for a better understanding of 
problems involved in group behavior. 


7 MEASUREMENT OF TOTAL PERSONALITY 


No very definite line of distinction can be drawn between the 
instrument to be discussed here and the types of adjustment inven- 
tories presented in the preceding section. However, the instrument 
presented here perhaps measures total personality rather than various 
aspects of personality. In its provision of means for determining a 
personality quotient (PQ), essentially comparable in significance to 
the ТО and EQ, there is at least an implication that it measures per- 
sonality more broadly than do most of the currently available adjust- 
ment inventories. 

McCall developed an Inter-Trait Rating Scale that yields per- 
sonality quotients in each of 43 areas of behavior that reflect per- 
sonality and also an average PQ. The scale can be used for persons 
12 years of age and older as a self-rating instrument or for obtaining 
ratings by friends. The essential feature of McCall’s procedure is to 
compare the amount of each trait possessed by an individual with 
the amount of some objectively measurable trait, such as intelligence, 
that he possesses. 

The first two rows of the accompanying copy of the Inter-Trait 
Rating Scale are filled in with ratings for an hypothetical individual 
who has an /Q of 115, as a basis for showing how the scale is used. 
Because the rater feels that the individual is lower in accuracy than 
in intelligence, he places a *—" in the second column. He estimates 
that his judgment on the individual's accuracy is about до per cent 
of certainty. Consequently, he writes “до” in the third column. The 
РО column is then filled out by taking half of the percentage of 
certainty and, because the sign in the second column is negative, 
subtracting it from the JQ of 115 to obtain a PQ of 95. On adapta- 
bility, the plus rating with a 3o per cent degree of certainty indicates 
that 15 should be added to the /Q of 115, to net a PQ of 130. The 
average of the 43 separate PQs then becomes the general PQ for the 
individual. 
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McCall Inter-Trait Rating Scale 2° 


: Above or Below Percent of Personality Quotients 
Intelligence Certainty (% the % plus IQ) 


"Accuracy: мег еа аа nee c 40 95 
Adaptability ......... T 30 130 
Appearance .......... 
Cheerfulness 

Conscientiousness 


Cooperativeness ....... 
Courage ...... 
Courtesy 
Decisiveness 
Democracy 


Effectiveness ......... 
Enthusiasm 

Foresight .. 
Generosity . 
Happiness 


Healthiness, 5... ees 
Independence ........ 
Industriousness ....... 
Initiative 

Leadership 


Likeableness | ....... nb 
Loyalty ..... -. 
Open-Mindedness 
Orderliness vn 
Originality eee et 


Persistence? nt o Iak 
Pleasing Voice 

БОЕ Б 
Progressiveness ze 
Punctuality ........... 


Refnemént c es 
Reliability ..... 
Self-Confidence 
Self-Control 

Sense of Humor 


Sincerity .. 
Sociability 
Sympathy 
Жарак. Ед: a 


"Tolérdncé 1222202 208 
"Truthfulness 
Vivacity 


88 William A. McCall, Measurement. Macmillan Co., New York, 1939. p. 315. 
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McCall stated that the embarrassment that sometimes arises 
when one person is asked to rate a friend in the friend's presence 
will not arise with this rating scale. He said, in commenting upon the 
manner in which some friends rated him: ?* 


Since they could not rate him down in accuracy without rating him 
up in intelligence or up in adaptability without rating him down in in- 
telligence there was no particular embarrassment to them or him in these 
ratings, although the author does not see himself as others see him at 
certain points. They were not asked to state whether the author was very 
dull or very intelligent or very accurate or inaccurate, nor even to state 
how much difference there is between his intelligence and his accuracy. 


Topics for Discussion 


т. In what way is a knowledge of personality measurement procedures 
valuable to the teacher? 
2. What is meant by personality? How do psychologists and laymen 
differ in their conceptions of personality? 
Briefly characterize two association methods of evaluating behavior. 
4. Indicate the nature of observation procedures for the evaluation of 
personality. 
What is the nature of graphic rating scales? 
How are personal reports used in personality measurement? 
7. Briefly indicate the nature of attitudes. Of what concern are they 
to the teacher? 
8. Indicate the nature of one or two attitudes scales for use in the 
elementary or secondary school. 
9. What is the nature of interests? How are pupil interests of signifi- 
cance to the teacher? 
то. Discuss the two major procedures used in the measurement of 
interests. 
ir. What are the causes and symptoms of emotional maladjustment? 
Which are easier to recognize? Why? 
12. Indicate some of the methods by which pupil adjustment is meas- 
ured. 
13. What are three major methods of evaluating individual behavior? 
14. How should the teacher expect to be involved in the administration 
and use of results from projective techniques? 
r5. What are some of the modern methods for evaluating group 
dynamics? 
16. In what ways can the classroom teacher appropriately use socio- 
grams? 
34 Ibid. p. 314 
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17. Discuss how total personality is measured by one technique. What 
is the PQ? 
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Summarizing the Results 
of Measurement 


THE FOLLOWING points in the summarization of test results are con- 
sidered in this chapter: 


Statistical procedures in summarizing test results. 
Tabulation of test scores. 

Common measures that express typical performance. 
Common measures of spread or variability. 


Boo Wo 


It is common knowledge that a wide range of ability may be 
expected from the different individuals in a given class and that 
Scores representing objective measures of achievement or other 
traits will vary widely. Since the human mind is not able to grasp 
and hold numerous unlike facts in isolation, accurate description of 
test results depends upon their statistical summarization. Summaries 
and descriptions of this type need not disturb the student, for after 
all most of these elementary statistical procedures are simple. 
Actually, the main requirements are the learning of a new and dif- 
ferent type of vocabulary and the use of a few relatively simple 
arithmetic skills. 

The use of statistical methods in the analysis of test results is 
directly in line with good scientific technique. Scientific method in 
handling test results involves: 


(т) The collection of facts. Within the limits of accuracy of the tests 
used, the test scores may be said to represent facts. 
3c8 
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(2) The classification and organization of the facts. Simple statistical 
practices of grouping and tabulating data are utilized for this 
purpose. 

(3) The further reduction and analysis of the data. Such common 
statistical procedures as determining measures of central tend- 
ency, variability, and relationship are required at this point. 

(4) The interpretation of test data. Graphical representations and 
various types of derived scores are involved here. 

(s) The validation of tests. Procedures for determining the validity, 
reliability, and objectivity of tests illustrate this need. 


'The most important statistical techniques from the standpoint of 
the frequency of their use in education involve abilities to: (1) classify 
and tabulate data, (2) determine and apply the common measures 
of central tendency, (3) determine and apply the common measures 
of spread or variability, (4) secure derived scores and use them in 
the interpretation of test results, (5) utilize graphical methods in 
the presentation and interpretation of test results, and (6) apply 
correlational procedures in determining the relationship between two 
sets of data. The discussion and explanation of these techniques con- 
stitute the major portions of this and the two following chapters. 

This relatively large amount of emphasis is given to these points 
for two reasons: (1) Successful and satisfactory work with test re- 
sults can be expected only when the person using them is adequately 
equipped to understand and interpret them. Such abilities are de- 
pendent on a reasonable mastery of these elementary statistical 
techniques. (2) Current educational literature in practically all 
fields is literally filled with the terms and the techniques discussed 
here. Reports of progress in education are dependent on statistical 
methods. If the teacher and the student are to keep up to date edu- 
cationally, they must develop the ability to read with understanding 
the practical aspects of statistical discussions in current educational 


literature. 
1 CLASSIFICATION AND TABULATION OF TEST SCORES 


Need for a method of grouping data 


The very fact that people are unlike physically and mentally gives 
rise to the ‘need: for statistical methods in psychology and education. 
For example, it may be observed readily from Table 9 that there 
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are great differences in the scores made by the thirty-seven pupils 
who took a certain reading test. However, it requires rather careful 
Scrutiny to determine that the highest and lowest scores are re- 
spectively 72 and 24, while very little further information can be 
obtained from these scores without rearranging them. 


TABLE 9. Reading test scores of 37 ninth-grade pupils in alphabetical 
order of last names 


The relatively simple practice of arranging test scores in order of 
size from highest to lowest is helpful, however. Table ro reproduces 
the reading test scores of the thirty-seven pupils in descending order. 
It now is more easily apparent than from Table 9 that the highest 
and lowest scores are respectively 72 and 24, while it can also rather 
easily be determined that the middle Score, or midscore, is 48. 


TABLE 10. Reading test scores of 37 ninth-grade pupils in descending 
order 


Four consistent ways in which these same thirty-seven scores 
can be classified into a frequency distribution are shown in Table 11. 
The first illustration, in which the scores retain their individual 
identities, may be called a simple or ungrouped frequency distribu- 
tion. The other three illustrations, in which the grouping of scores 
destroys the individual identities of most of them, are called grouped 
frequency distributions. The first distribution furnishes the basis for 
obtaining detailed information concerning these scores, but such 
information would be rather costly in the time required to derive it. 
The fourth illustration furnishes the basis for obtaining quick but 
quite unsatisfactory information, for the very rough grouping almost . 
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entirely sacrifices even the approximate identity of the individual 
scores. The second and third illustrations, neither of which demands 
an undue time expenditure in order to obtain accuracy nor sacrifices 
accuracy for a saving in time and labor, represent satisfactory prac- 
tices midway between the two extreme methods of handling the 
scores. The second is somewhat preferable to the third for these data. 


TABLE 11. Reading test scores of 37 ninth-grade pupils in frequency 
distributions 


AK Intervals of Intervals of 

Intervals of 1 Unit 3 Units 5 Units 
Scores f Scores f Scores f Scores f Scores f 
72 € 55 302 50 ZI E 08-72. MÀ 
71 54. В 68-70 2 63—67 2 
70 = 155 37 65-67 I 58—62 „°% 
69 52 оет 62-64 I 53-57 4 
68 "r'" 93^ Easy 59-61 2 48-52 9 
67 50! To2!t09:34 а 56-58 2 ASAT HOI 
66. «49 233 59595) 0 38420043 
65 48 3 32) Е 50—52 4 33:537 2 
64 Aye IRSE 47-49 6 28-32 ü 
63 т дои зоти 44-46 5 232]. eT 
T E 1 K ао А Intervals of 

бо „2. 43: 2,0147 35-37 1 rupis 
59 A24 26 32—34 2 Scores Í 

58 т Ат 25 29-31 I 

D7 її 40 ал 26-28 68-82 3 
56 23-25 ^ 1 53-67 9 
38-52 20 
23-37 3 


In the preparation of a frequency distribution, the method of 
grouping test scores is not dissimilar to that followed by postal 
clerks in tlie distribution of outgoing letters in a large railway postal 
terminal. Mail designated for certain sections of the country or for 
certain large centers from which it is redistributed is thrown into 
the proper mail pouches. Some pouches, however, contain mail ad- 
dressed to a number of post offices in the same section of the country. 
For example, in the Chicago postal terminal, mail addressed to post 
offices in the Pacific northwest may be consigned to a certain group 
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of pouches, while mail going to the southwest section of the United 
States will be consigned to other pouches representing that section 
of the country. The number of pouches required depends on the 
number of pieces of mail to be distributed and also on the population 
of the section of the country that can be most efficiently served by 
a given pouch. Increasing the number of pouches naturally increases 
the labor involved in sorting the pieces of mail, but at the same time 
it increases the accuracy of the distribution. Mail in Chicago might 
be sorted into two classes—eastbound and westbound. This would 
introduce a large error, since not all sections of the country would 
be effectively served by this rough classification. The other extreme, 
of using at this point a separate pouch for the mail addressed to each 
post office, would be entirely impracticable. 


Classifying and tabulating scores 


The foregoing illustration will give the reader a clear conception of 
the purposes and the problems involved in grouping test scores into 
a frequency table. The steps of procedure presented in the following 
paragraphs are for the student's guidance in the preparation of 
grouped frequency distributions of test scores. 

(т) Determine the range. Find:the highest score and the lowest 
score in the series. Find the difference between these scores. The 
difference is called the range (R) of the distribution. In the case 
of the scores given in Table o, the range is 72 — 24, or 48. The 
range is useful in determining the number of class intervals to use 
in the frequency table. 

(2) Determine the size of the class intervals. Use the range to 
determine whether the scores should be classified by units of 1, 3, 5, 
7, Or 15, i.e., to determine the size of the class intervals. For the range 
of 48 found for the scores of Table 9, a practical grouping is by 
intervals of 3 units each. 

No special rule relative to the number of class intervals to be used 
in a frequency table can be stated. However, it is usually unwise to 
group data into fewer than ten to twelve class intervals because of 
the greater error of grouping as the number of intervals is decreased. 
Likewise, it is usually undesirable to use more than twenty to twenty- 
five class intervals because of the increased labor involved. The main 
idea of grouping the scores into approximately twelve to twenty or 
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so intervals is to classify the scores into a sufficiently small number 
of groups that they may be thought about effectively and yet not into 
so few groups that important differences are covered up or significant 
errors are introduced. 


TABLE 12. Relation between range of scores and size of class intervals 


For a Range.of | Use a.Class-Interval of 


25 or less 
26 to 69 
7o to 125 
126 to 175 
176 or more 


(3) Set up the frequency table. Construct the table into which the 

scores are to be tabulated by use of the following steps of procedure : 

(a) Label three column headings c.i., Tabulation, and f. The 

first and third headings are commonly-used abbreviations for “class 
interval” and “frequency” respectively. 

(b) Determine the limits of the highest class interval. First 
find the multiple of the class interval that is closest to or equal to 
the highest score in the series. This number is the midpoint of the 
highest class interval. Then establish the integral, i.e., whole-number, 
limits of the interval equal distances above and below this midpoint, 
so that the distance between the integral limits is one scale unit less 
than the size of the class interval. The veal limits of the intervals are 
then found by continuing upward .5 of a score unit above the higher 
integral limit and .5 of a score unit below the lower integral limit. 
Figure 23 shows how the midpoint, integral limits, and real limits 
appear for this highest class interval. 


25:5 Tad Higher Real Limit 


73 Higher Integral Limit 
72 Midpoint 

71 Lower Integral Limit 
70.5 Lower Real Limit 


Fig. 23. Midpoints and limits of a class interval 
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(c) Write the integral limits of the class intervals in the cá. 
column. Start at the top with the interval that will include the high- 
est score and continue downward consistently to include at the 
bottom the interval that will include the lowest score. 

At this point it may help to refer again to the scores given in 
Table 9. For these data it was determined above that the grouping 
should be by class intervals of 3 units each. Since the highest score, 
72, is exactly divisible by 3, 72 is the midpoint of the highest class 
interval. Then 71 and 73 will become the lower and higher integral 
limits respectively, and 70.5 and 73.5 will become the lower and 
higher real limits respectively, of the class interval. The next lower 
interval will have a midpoint of 69, integral limits of 68 and 70, and 
real limits of 67.5 and 70.5. The first three columns of Table 13 
show these various points for each interval for the entire distribution 
based on the scores of Table о. 


TABLE 13. Reading test scores of 37 ninth-grade pupils in a grouped 
frequency distribution 


Class Interval (c.i.) 
Integral Real Mid- Tabulation poe 

Limits Limits point 

71-73 70.5—73.5 72 / I 
68—70 67.5—70.5 69 // 2 
65—67 64.5—67.5 66 df i 
62-64 61.5—64.5 63 / I 
59-61 58.5-61.5 60 jy 2 
56—58 55-5—58.5 57 iis 2 
53-55 52.5-55.5 54 /// 3 
50-52 49-5-52.5 51 ГА 4 
47—49 46.5-49.5 48 —— 6 
44-46 43.5-46.5 45 5 
41-43 40.5—43.5 42 /// 3 
38-40 37-5-40.5 39 // 2 
35-37 34-5-37.5 36 I 
32-34 31.5-34.5 33 // 2 
29-31 28.5-31.5 30 / I 
26—28 25.5—28.5 27 о 
23—25 22.5-25.5 24 / T 

N=37 


рр 
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The above directions and illustration will be clarified if the terms 
are reviewed and defined. The class interval is the group, or com- 
partment, within the limits of which given scores are assigned. The 
mid point is a point midway between the upper and the lower limits 
of the interval. The integral limits are the limits or boundaries of 
the interval in terms of whole numbers. The rea limits are the actual 
boundaries of the interval. For convenience in tabulation it is found 
desirable to choose the limits of the interval in such a way that the 
midpoint is a whole number. This, of course, makes it necessary that 
the upper and lower real limits be fractional values whenever odd- 
sized intervals are used. Many statisticians prefer this method be- 
cause they recognize that, although test scores are usually not 
given in fractional values, a score of a certain value, say 72, might, 
if the measurement were more accurate, equally well represent a 
score a fraction above 72 or a fraction below 72. A score of 72, then, 
represents any score between 71.5 and 72.5. This method has the 
merit of furnishing a natural location on the scale for all scores 
expressed in whole numbers.* 

(4) Tabulate the scores. Begin with ‘he first score in the original 
list of scores. Determine in which class interval this score will be 
included. Place a tally mark in the Tabulation column of the ap- 
propriate class interval. Make another tally mark for the second 
score in the interval in which it is included. Continue thus until a 
tally mark has been made for each score in the series. Make each 
fifth mark in any interval a slanting mark across the preceding four 
tally marks. Complete the frequency distribution by totaling the 
tally marks in each row and writing the proper number for that row 
wo most common assumptions made in the statis- 
of a test score. The other widely used statistical 
method assumes that the true score, of which the test score actually obtained is only 
an estimate, is not likely to be less than the obtained score but may lie anywhere 


between the obtained score and a score one unit greater. For example, a score of 72 


represents a true score somewhere from 72 to 72.9999-.-- 

The authors believe that the method used in this volume represents a more de- 
fensible assumption concerning the meaning of a test score. However, instructors 
preferring to use the other method can do зо easily by shifting each midpoint and 
each real limit of a class interval .5 of a score point upward from the values given 
in this and the two following chapters. For example, the real limits of 70.5 and 
73.5 for the interval 71-73 would become 71.0 and 73.9999 - ..in the other method, 
and the midpoint of 72 for the same interval would become 72.5 in the other method. 
It should be noted, also, that this procedure results in an arithmetic mean that is 


exactly .5 larger than for the method used in this volume. 


1 This represents one of the t 
tical work concerning the meaning 
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in the f column, and then by obtaining the sum of these frequencies. 
This sum, NV, should equal the total number of original scores. 


Summary of steps in classifying and tabulating scores 


The classroom teacher will sometimes find the construction of a 
frequency table unnecessary in his experience with tests, since he 
often works with small numbers of cases and can check the scores 
from the papers themselves. However, there are many occasions 
when the frequency distribution is necessary. It is an effective way 
of recording and preserving the results of using tests in the class- 
room. It makes possible a number of short cuts in the calculation of 
certain statistical measures useful in interpreting test results. The 
methods by which these measures are computed are given in succeed- 
ing sections of this chapter. This section summarizes in concise form 
the steps of procedure in setting up a frequency table and in tabulat- 
ing scores. 


(1) Determine the range. Find the highest score and the lowest score 
and obtain the difference between them. (R) 


(2) Determine the size of the class intervals. If the result of step (1) 
is: 25 or less, use a class interval of 1; 26 to 69, use a class inter- 
val of 3; 70 to 125, use a class interval of 5; 126 to 175, use a 
class interval of 7; and 176 or more, use a class interval of т с. (c4.) 


(3) Set up the frequency table: 

(a) Label three column headings c.i., Tabulation, and if; 

(b) Determine the limits of the highest class interval so that its 
midpoint is divisible by, and the distance between its real 
limits is equal to, the size of the interval determined in step 
(2), and 

(c) Write the integral limits of the class intervals in the c.i. col- 
umn, starting at the top with the interval that contains the 
highest score and continuing downward consistently to include 
the interval that contains the lowest score. 


(4) Tabulate the scores. Place a tally mark in the Tabulation column 
opposite the c.i. that indicates the proper position for each score, 
carry across the total of the tally marks in each class interval to 
the f column, and add the frequencies in the f column to obtain the 
total number of cases. (N) 
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Exercises in Tabulating Test Scores 


t. Set up a frequency table and tabulate the arithmetic test scores listed 
below for 40 fifth-grade pupils. 
59, 48, 38, 66, 57, 42, 51, 66, 75, 53, 55» 47, 41, 35, 49, 15 63, 
55, 51, 44, 79, 66, 57, 58, 51, 45, 52, 50, 48, 72, 51, 5% 64, 59, 
53, 42, 53, 55, 58, 61. 
Set up a frequency table and tabulate the language test scores listed 
below for зо tenth-grade pupils. 
56, 47, бо, 52, 39, 65, 41, 81, 69, 30, 21, 64, 44, 55, 28, 11, 6, 
24, 49, 45, 10, 49, 46, 39, 24, 64, 29, 34, 46, 34 
3. Set up a frequency table and tabulate the spelling test scores listed 
below for зо ninth-grade pupils. 
I9, 14, II, 9, 17, 15, 13, 13, 6, 16, 21, 10, 12, 18, 11, 13, 10, 5, 8, 
15, 14, 18, 4, II, 12, 13, 22, 6, 21, 14. 


ә 


2 MEASURES OF CENTRAL TENDENCY 


Need for measures of central tendency 


The grouping of test scores into frequency tables is one step in 
the process of condensing them so that they can be analyzed and 
interpreted. However, а further step must be taken before it is 
possible to describe the data. Some single term or value that is repre- 
sentative of the entire table must be found. Since these values which 
may be taken to represent an entire distribution of scores are usu- 
ally found near the center of the data when arranged in order of size, 
they are commonly called measures of central tendency. The three 
common measures of central tendency are: (т) the arithmetic mean, 
(2) the median, and (3) the mode. Of these three measures of central 
tendency, the median and the arithmetic mean are used almost ex- 
clusively in educational measurements and are, accordingly, the 
only ones emphasized in this discussion. 


Computing the arithmetic mean of ungrouped data 


The arithmetic mean is the best known and the most widely used 
measure of central tendency. Indeed, the word “average” is thought 


by many persons to designate the arithmetic mean, although the 


arithmetic mean is only one of several “averages.” 
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Practically everyone knows how to find and use the so-called 
average or arithmetic mean. It is commonly defined as the measure 
resulting from dividing the sum of the measures in the distribution 
by the number of measures. Thus the arithmetic mean of the Scores 
93, 90, 89, 88, and 86 is 446 —— 5 = 89.2. The value of this measure 
lies in the fact that it lends itself to describing by means of a single 
term a group of widely varying scores or measures, It expresses in 
very compact form one specific fact about the scores in which each 
single score has a part. On this account it is one of the basic statistical 
measures of central tendency. 


Computing the arithmetic mean of grouped data 


The arithmetic mean can also be readily found from a frequency 
distribution. In order to make the procedure somewhat more definite 
it is advisable to redefine the term arithmetic mean. When considered 
from this point of view the arithmetic mean is defined as a point on 
the scale such that the sum of the deviations of the values larger 
exactly equals the sum of the deviations of the values smaller than 
it is. Expressed in physical terms, it may be thought of as the point 
at which the fulcrum must be placed in order to balance the scale, 
when it is considered as a beam of varying thickness or density, This 
point may be determined experimentally or by mathematical cal- 
culation. Without regard to the method employed, the fulcrum must 
be so placed that the moments of the forces оп one side are exactly 
equaled by the moments of the forces on the other side. 

Figure 24 illustrates the principle of moments of force by a beam 
in balance when a weight of one pound is suspended three feet from 
the fulcrum and a weight of three pounds is suspended one foot from 
the fulcrum. 


Fig. 24. Moments of force and the arithmetic mean 


The parallel between the physical lever and the mathematical cal- 
culation of the arithmetic mean is quite close. The problem in each 
case is to balance the forces on either side of a point to be determined. 
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If the physical lever is out of balance, the correction is made by 
moving the fulcrum in the direction of the heavier end until equilib- 
rium is established. In calculating the arithmetic mean a sort of trial 
balance is taken. If the moments of force are too great on one side, 
the point of rotation is similarly moved in the direction of the heavy 
end until the difference between these two forces becomes zero. 

This may be aptly illustrated by a procedure which classroom 
teachers have undoubtedly frequently used. For example, it is desired 
to obtain the average of the scores of 93, 90, 89, 88, and 86. By in- 
spection it may be seen that 89 is approximately the correct mean. 
The 9o is one point too large, the 93 is four points too large. In a 
corresponding way 88 is one point too small and 86 is three points too 
small. The total of the differences above the assumed mean is five 
and the total of the differences below the assumed mean is four; 
therefore the assumed mean of 89 is too small by the amount of this 
difference divided by the number of cases. Since this is equal to 
1—- 5, or .2, the mean is 89.2. This checks exactly with the mean 
found by the method of totaling the measures and then dividing 
by the number of measures, given on page 318. 

This method of computing the arithmetic mean will now be applied 
to the grouped frequency distribution given in Table 13 for thirty- 
seven reading test scores. 

(1) Assume a value for the mean. The midpoint of a class interval 
near the middle of the frequency distribution should be taken as 
the assumed mean. This class interval is usually chosen so that it 
fairly closely approximates the arithmetic mean. As a matter of 
fact, however, the results will be the same regardless of the particular 
interval whose midpoint is chosen as the assumed mean. In the il- 
lustration of Table 14, it has been estimated that the arithmetic 
mean will fall in or near the interval 50-52, 50 the assumed mean 
is 51.00. For reasons that will be discussed in a later section of this 
chapter, it is common practice in computing the arithmetic mean 
to assume that all scores in each interval have the value of the 
midpoint. 

(2) Lay off the deviations from the assumed mean by intervals. 
Fill in the d column by assigning a deviation of o to the class interval 
in which the assumed mean is located and then counting both 
upward and downward from that interval by units. Deviations above 
the assumed mean are positive and those below the assumed mean 
are negative. Since this is equivalent to showing the number of class 
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intervals by which each interval deviates from the one containing the 
assumed mean and also its direction from that interval, the deviations 
are said to be stated in terms of class intervals. 


TABLE 14. Computation of the arithmetic mean for the grouped 
frequency distribution of 37 reading test scores 


Class Interval - Assume a value for the mean 
(¢.i.) Fre- |Devia- (51.00) 


quency 
Integral} Mi (f) - Lay off the deviations from the 
Limits assumed mean by intervals 


€ 


71-73 Add the fd column to the table 
68-70 
65-67 
62-64 
59-61 
56-58 
58255 
50-52 
47-49 
44-46 
41-43 
38-40 
35-37 
32-34 
29-31 
26-28 
23-25 


н 


OO |o орол ым 


. Obtain the products of the fre- 
quencies and deviations by in- 
tervals 


. Determine the algebraic sum 
of the deviations of the scores 
Zfd = +41 + (—66) = —25 


+++++++ 


. Determine the mean of the 
deviations of the scores 
Zfd|N = —25/37 = —.68 


| 
H 
o 


. Convert the mean of the devi- 
ations of the scores to the scale 
value c.i. X Zf/d]N = 3 X 
—.68 — —2.04 


н Он ю ҥн юс ел OAD IO ююн ҥн юн 


| 
ч 
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Obtain the arithmetic mean. 
Assumed mean + ci. X 
Zfd]lN = 51.00 + ( 

= 48.96 (A.M.) 


= 
ll 
= 


(3) Add a column at the right of the table and label it fd. This 
column is used for recording the products of the frequencies and the 
corresponding deviations. 

(4) Obtain the correction to the assumed mean. Obtain this cor- 
rection by use of the following steps of procedure: 

(a) Obtain the products of the frequencies and deviations by 
intervals. Multiply each frequency by its corresponding deviation 
and place the results in the fd column. Products below the assumed 
mean will have negative signs, 
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(b) Determine the algebraic sum of the deviations of the 
scores. Add the fd.values algebraically to obtain Xfd by determining 
the sum of the positive values, the sum of the negative values, and 
then assigning the sign of the larger value to their difference. This is 
equivalent to finding the magnitude of forces at one end and at the 
other end and then obtaining their difference, in the illustration of a 
beam resting on a fulcrum. Since for the distribution of Table 14, 
the positive fd values total 41 and the negative fd values total —66, 
their algebraic sum is —25. This is shown graphically in Figure 25, 
which illustrates the scores of the frequency distribution of Table 14 
distributed along a beam resting on a fulcrum at the point of the 
assumed mean. 


Assumed Mean =51.00 


Arithmetic Mean -4896 / 
/ +41 


Fig. 25. Moments of force for the 37 reading test scores 


^ (c) Determine the mean of the deviations of the scores. 
Divide Xfd by N to obtain X jd/N, retaining the proper sign. In order 
to bring about an exact balance of these two forces, the fulcrum must 
be moved slightly in the direction of the heavier end of the scale, 
which is, in this case, in a minus direction. Since there are 37 meas- 
ures in the distribution and each of them contributes equally to the 
resultant force of —25 units, the average correction is the result of 
dividing —25 by 37, or —.68- 

(d) Convert the mean of the deviations of the scores to the 
scale value. Since each interval in this table is three units in size, it 
is necessary to multiply —.68 by 3 to turn the correction into scale 
units. The resulting value of —2.04 represents the amount by which 
the assumed mean must be corrected. 

? The question of how many places to carry decimals constantly arises in sta- 
tistical work. For the computations here, carry the calculations to three decimal 
places and round them back to two. For example, the division of —25 by 37 results 


іп a decimal of —.675, which should be rounded back to —.68. If the decimal had 
been —.674, the decimal in the third position would have been dropped and the 


value would have become —.67- 
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(5) Obtain the arithmetic mean. The arithmetic mean results 
from the algebraic addition of the assumed mean and the correction. 
As the sign of the correction in the illustration is negative, the cor- 
rection is subtracted from the assumed mean. Therefore, the arith- 
metic mean is 51.00 — 2.04, or 48.96. (4.M.) This step is equivalent 
in the illustration of Figure 25 to moving the fulcrum 2.04 score units 
downward to bring the beam into balance. 


Summary of steps in computing the arithmetic mean 
of grouped data 


The steps below summarize the procedure outlined in detail above 
for computing the arithmetic mean (4.M.) from a grouped fre- 
quency distribution. 


(т) Assume a value for the mean. Choose the midpoint of an interval 
near the middle of the distribution. 


(2) Lay off the deviations from the assumed mean by intervals. Write 
in the d column a value of o for the interval in which the assumed 
mean lies and write values for other intervals by counting upward 
(positive signs) and downward (negative signs) by units. 


(3) Add a column at the right of the table and label it fd. 


(4) Obtain the correction to the assumed mean: 

(a) Obtain the products of the frequencies and deviations by 
intervals. Multiply each frequency by its corresponding devia- 
tion, retaining negative signs for intervals below the assumed 
mean, and carry the results to the fd column, 

(b) Determine the algebraic sum of the deviations of the scores. 
Algebraically add the values in the fd column obtained in step 
(4a) and retain the appropriate sign (24), 

(c) Determine the mean of the deviations of the scores. Divide 
the.result of step (4b) by the number of cases, retaining the 
appropriate sign (2'fd/N), and 

(d) Convert the mean of the deviations of the scores to the scale 
value. Multiply the result of step (4c) by the size of the class 
interval (c.i. X Yfd/N). 


(5) Obtain the arithmetic mean. Algebraically add the correction of 
Step (4d) to the assumed mean to obtain the arithmetic mean. 
А.М. = Assumed Mean + (c.i. X 2fd/N). 
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Exercises in Computing the Arithmetic Mean 


4. Compute the arithmetic mean of the 40 arithmetic test scores tabu- 
lated in a frequency distribution for Problem т, page 317. (4.M. — 
54.90) 

5. Compute the arithmetic mean of the зо language test scores tabu- 
lated in a frequency distribution for Problem 2, page 317. (A.M, = 
42.15) 

6. Compute the arithmetic mean of the зо spelling test scores tabu- 
lated in a frequency distribution for Problem 3, page 317. (А.М. = 
13.03) 


Computing the mid-measure 


Early workers with tests popularized the practice of taking the 
score of the middle paper of a pile of test papers arranged in order 
of size of scores as the expression of the central tendency of the group. 
The ease with which this so-called median is found has appealed to 
the classroom teacher. For a long time this was called the median. 
However, more recent workers with tests have recognized that the 
score of the middle paper in a pile of test papers stacked in order 
of size of scores is not the same as the middle point on the scale of a 
frequency table of these same scores. Accordingly, a distinction is 
now made between the score found on the middle paper of a pile of 
stacked papers and the median proper. The score of the middle paper 
of a pile of test papers arranged in systematic order is called the 
mid-measure to distinguish it from the median, which is the cor- 
responding value when the data are grouped in a frequency dis- 
tribution. Thus the mid-measure is a counting median found from 
ungrouped data. The median is computed only from tabulated data. 
The method of computing the mid-measure is illustrated by referring 
to the data given in Table то, page 310, where these scores are ar- 
ranged in descending order. The mid-measure is the score of the 
middle paper when the number of cases is odd, or the average of the 
two scores nearest the middle when the number of cases is even. 
In this case the number of papers or scores is 37 (odd). Thus the 
mid-measure is 48, a score such that there are just as many equal to 
or larger as there are equal to or smaller than it is. 
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Computing the median of grouped data 


By definition, the mid-measure and the median are quite similar, 
the main distinction being that the mid-measure is designated as an 
actual score on a certain paper (or the average of the scores on the 
two middle papers) while the median is defined directly in terms 
of a point on the scale of the frequency table on which it is 
based. The median is a point on the scale such that 5o per cent of the 
cases in the distribution are above it and 50 per cent of the cases 
are below it. 

The method of computing the median from a grouped frequency 
distribution is presented below and illustrated in Table 1 5 for the 
same group of reading test scores used previously in this chapter. 

(т) Obtain the half-sum. Divide the number of cases by two to 
determine how many of the cases fall below the median. For this 
illustration the half-sum, or N/2, is 37 —- 2, or 18.5. 

(2) Obtain the sub-total. Count upward into the distribution, 
adding the frequency for each successive interval, until exactly the 
half-sum or a number as closely approaching it as possible without 
exceeding it is reached. Thus, in the illustration, t} o 4- r+ 2 4- 
1+2-+3-+5= 15. If the six scores in the interval 47-49 were 
added, the result, 21, would exceed the half-sum. The median, there- 
fore, lies somewhere in the interval 47-49, for less than half of the 
scores lie below that interval and less than half of the scores lie 
above it. 

(3) Determine the correction. The three steps involved are to 
determine the correction: 

(a) In terms of measures. Subtract the sub-total from the 
half-sum. This subtraction will give the number of cases in the 
interval in which the median lies which must be added to the sub- 
total to obtain the half-sum, and consequently shows how much 
farther counting must continue upward to obtain the median. In the 
distribution of Table rs, this step becomes 18.5 — 15 = 3.5. 

(b) In terms of the class interval. Divide the result of the 
preceding step (half-sum — sub-total) by the number of cases in the 
interval in which the median falls. This will give the proportion of 
the interval that must be added to lower intervals in order to reach 
the point below which half of the cases fall. For the illustration of 
Table 15, 3.5 + 6 = .58. This step is based on the assumption that 
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the scores in an interval are uniformly distributed. More will be said 
about this assumption in a following section. 

(c) In terms of the scale distance. Multiply the result of the 
preceding step by the size of the class interval so that the correction 
will be stated as a scale distance. Thus, .58 X 3 = 1.74 for the ac- 
companying illustration. 


TABLE 15. Computation of the median for the grouped frequency 
distribution of 37 reading test scores 


Class Interval (c.i.) 
u—— à omm т. Obtain the half-sum 
Integral Real quency N 

Limits (f) = 


197786 
TR 18.5 


2 


71-73 
68-70 
65-67 
62-64 
59-61 
56-58 
59755 
50-52 
47-49 
44-46 
41-43 
38-40 
35737 
32-34 
29-31 
26-28 
23-25 


Obtain the sub-total 1 + o 
тр 211203715 


= 15 


. Determine the correction 
(Measures) 18.5 — 15 = 3-5 


. Determine the correction 
(Class interval) 
3.5 3-6 = .58 
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. Determine the correction 
(Scale distance) 
58 X 3 = 1.74 


Obtain the median 
46.50 + 1.74 = 48.24 
(Мап.) 
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(4) Obtain the median. Now add the correction in terms of scale 
distance to the lower real limit of the interval in which the median 
lies to obtain the median. The correction of 1.74 added to 46.50, or 
the lower real limit of the interval in Table 15 which contains the 
median, gives 48.24. (Мал.) 

Obviously, if the calculations of these steps were made by adding 
the frequencies down from the top of the distribution the median 
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would be the same. In that case, 16 scores falling above the interval 
47-49, the correction of 1.26 (2.5 measures, .42 of an interval, 1.26 in 
terms of scale units) would be subtracted from 49.50, the top of the 
step, to give the same result of 48.24 for the median. 


Summary of steps in computing the median 


The steps listed below provide in form for easy use the procedures 
necessary in computing the median from a grouped frequency dis- 
tribution. 


(т) Obtain the half-sum. Divide the number of cases by 2. (N/2) 
(2) Obtain the sub-total. Count upward into the distribution by adding 
successive frequencies until a number exactly equal to the half- 
sum or as closely approaching it as possible without exceeding it 
is reached.? 

Determine the correction: 

(a) In terms of measures by subtracting the sub-total from the 
half-sum, 

(b) In terms of the class interval by dividing the result of step 
(3a) by the number of cases in the interval in which the 
median lies, and 

(c) In terms of the scale distance by multiplying the result of 
step (3b) by the size of the class interval. 

(4) Obtain the median. Add the correction of step (3c) to the lower 
real limit of the interval in which the median lies to obtain the 
median. (Mdn.) 


< 


(3 


Exercises in Computing the Mid-Measure and Median 


7. Find the mid-measure of the 40 arithmetic test scores of Problem т, 
page 317. 

8. Find the mid-measure of the 3o language test scores of Problem 2, 
page 317. 

9. Compute the median of the до arithmetic test scores tabulated in а 

frequency distribution for Problem т, page 317. (Mdn. = 53.79) 

хо. Compute the median of the зо language test scores tabulated in a 

irequency distribution for Problem 2, page 317. (Mdn. — 43.50) 


5 If exactly the half-sum is reached, the median is usually the upper real limit 
of the interval the frequency of which was last added in the counting process. 
However, if the next higher interval should happen to have a zero frequency, the 
median is the midpoint of that interval. 
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Assumptions in computing measures of central tendency 


As has been indicated briefly in a preceding section of this chapter, 
the assumption concerning the distribution of scores within each 
class interval varies according to which of the measures of central 
tendency is being computed. Figure 26 shows in parallel graphical 
representations of the distribution of scores assumed in the computa- 
tion of the arithmetic mean and the median for several class intervals 
near the center of the distribution used for illustrative purposes in 
the preceding pages. f 
8 E 
Arithmetic Mean Median 
04, Í Real Limits Midpoints Real Limits ied 
Measures 
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56-58 2 57.00 


55.50 


54.50 
53-55 3 34? 
53.50 


tn 

N 

© 
tn 
N 
n 


pipe tH 


52.50 
51.75 
50-52 4 51.00 
50.25 


49.50 
49.00 
48.50 
48.00 
47.50 
47.00 
46.50 
45.90 
45.30 
44.70 
44.10 
43:5 аса 43-5 ———À—— 43.50 


Баас ата 49.5 


—— 48.0 


47-49 6 


Ао 46.5 


44-46 5 
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Fig. 26. Assumptions concerning the distribution of scores in class 
intervals in the computation of the arithmetic mean and median 


In the computation of the arithmetic mean it is assumed that each 
score in a grouped frequency distribution has the value of the mid- 
point of the interval in which it is tabulated. This is illustrated by 
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the heavily ruled lines in the left-hand portion of Figure 26. On the 
other hand, in the computation of the median it is assumed that each 
score in a grouped frequency distribution expands or contracts in 
such manner that it shares the scale distance through a class interval 
equally with the other measures in the same class interval. This 
assumption is illustrated in the right-hand portion of Figure 26. 
Thus, each of the five measures in the interval 44-46 is assumed to 
have the value of 45 in computing the arithmetic mean and to oc- 
сиру one-fifth of the scale distance through that interval (15 X 3 
= 0.6) in computing the median. Again, the three scores in the step 
53-55 are assumed in computing the arithmetic mean to be con- 
centrated at 54, the midpoint of the interval, while in computing the 
median each of the scores is assumed to occupy one-third of the 
scale distance through that interval (14 X 3 = 1.0). 

This leads to one further important distinction between the arith- 
metic mean and the median. The mean is algebraic in nature (al- 
though the various operations can be stated either in algebraic or in 
arithmetic terms), while the median is arithmetic in nature. 


В MEASURES OF VARIABILITY 


Need for measures of variability 


The measures of central tendency represented by the arithmetic 
mean and the median are valuable statistical measures but they de- 
scribe only one characteristic of the data—the tendency of the scores 
to pile up at or near the middle of the distribution. Descriptions of 
test results based wholly on one or the other of these measures are 
incomplete. 

The two groups of scores presented as Class A and Class B in 
Table 16 illustrate this situation clearly. The means of the two series 
of scores are identical, each being 86. The range of the scores for 
Class A is 72 (122 — 50), which is exactly three times the range 
(98 — 74 = 24) of scores for Class B. Even an inexperienced teacher 
will recognize that very different ranges of ability are present in these 
two classes and that correspondingly different instructional prob- 
lems are presented to the teacher. 

A graphical illustration based on other data showing the un- 
likenesses that may appear in distributions having the same mean is 
given in Figure 27. 
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TABLE l6. Data showing identical means but unlike variability 


Class A Class B 


Itisa common practice to show frequency distributions in graphical 
form by representing the frequencies at a given point on the scale 
in terms of a line erected perpendicular to the base line or scale. If 


Fig. 27. Measures of variability for homogeneous and 
heterogeneous data 


the tops of a large number of these perpendiculars are connected, the 
result is a curved line that usually is close to the base line at the 
ends of the scale but that rises quite rapidly from the base line 
near the middle of the scale. In Figure 27 the curve marked x repre- 
sents the distribution of scores made on a certain test by a class. The 
closeness of the curve to the base line at the ends or extremes shows 
that there are relatively few very low and very high scores. The high 
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point of the curve near the middle indicates that a great many pupils 
made scores near the average. This is typical of situations usually 
found where considerable numbers of cases are involved. 

It will be noted in Figure 27 that, while the middle portion of the 
curve х rises much higher than the similar portion of the curve y, 
the extremes of curve x do not go out on the base line in either 
direction so far as is true of curve y. This flatness or peakedness of 
the curve is the graphical indication of the variability of the data it 
represents. The less peaked the curve the greater the variability, 
other things being equal. It is thus apparent from this illustration 
that while the means of these two distributions are identical, very 
greatly different teaching problems are represented. Curve x repre- 
sents a relatively homogeneous group, while curve y represents a 
more widely scattered group. 


The range as an expression of variability 


The range of scores, that is, the scale distance between the lowest 
and highest scores in a distribution, is one way of expressing varia- 
bility. However, it is one of the least reliable measures of variability 
or dispersion, since it is apparent that it is affected by the fluctuation 
of the extreme scores. 


The standard deviation as a measure of variability 


Such a simple method of expressing the variability of a distribution 
of test scores as the range is sufficient for some purposes, but where 
careful work of an analytical or research type is being done a more 
exact means of expressing variability must be used. The standard 
deviation, also called sigma (о), has many characteristics that make 
it a useful measure of variability. The standard deviation is a sort о / 
arithmetic mean of the squares of the deviations from the mean of 
the distribution. It is a special type of mean of the deviations because 
of the method used in computing it. In calculating the standard 
deviation (c), each individual deviation from the mean is squared, 
the sum of these squared values is then divided by the number of such 
deviations, and the square root of the result is then obtained. Re- 
stated, the standard deviation (sigma) is the square root of the mean 
of the squares of the deviations from the mean 0f a distribution. Ex- 
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pressed in symbols the standard deviation becomes c.i. 4 B хуа! 


where fd? equals the deviations in the form of the sum of the 
products of the frequencies in each interval by the deviation of each 
interval from the assumed mean, NV equals the number of cases in 
the distribution, c equals the correction as found in calculating the 
mean, and c.i. equals the size of the class interval of the distribution 
in units. 

The lines S and T of Figure 28 represent ordinates erected at a 
distance equal to one с on either side of the mean. The standard 
deviation takes into account approximately 68 per cent (in a normal 
distribution 68.26%) of the area of such a distribution. That is, 
ordinates erected at a distance equal to one sigma on either side of 
the arithmetic mean include approximately two-thirds of the cases 
in the distribution. 


Fig. 28. Arithmetic mean and standard deviation 


A further interesting characteristic of the standard deviation is 
also indicated in Figure 28. Mathematically, the value sigma bears a 
definite relationship to the curve of the distribution itself. Where any 
large number of cases or scores are found in a distribution there is 
a tendency for the larger portion of the cases to pile up at or near 
the middle of the distribution. When a normal distribution is pre- 
sented in graphical form, the result is a symmetrical bell-shaped 
curve with many cases in the middle and few cases at the extremes. 
Certain types of these characteristic bell-shaped distributions have 
come to be called normal curves. For these normal curves, formulae 


332 THE SECONDARY SCHOOL 


have been derived from which such typical curves may be constructed 
if certain basic data concerning the curve are given. In these formulae 
sigma is one of the. values that must be given in order to construct 
Such a curve. Sigma, in the typical formula, represents the distance 
from the mean at which the curve changes from convex to concave. 
In Figure 28 the points where the curve changes its character are 
indicated by the ordinates lettered S and T. 

Thus, because of this direct mathematical relationship that the 
standard deviation bears to the curve of the distribution itself and 
the reliable expression of variability that it provides, since every 
deviation in the distribution is considered, the standard deviation is 
one of the most useful of the measures of variability. 


Computing the standard deviation of ungrouped data 


In the computation of the standard deviation of ungrouped data, 
as in the illustration of Table 17, the mean of the distribution must 
be found. When the data are grouped in a frequency table it is not 
strictly necessary for the arithmetic mean to be computed, although 
it is necessary to go through all steps of the process except the 
last. 

The steps in the computation of the standard deviation of un- 
grouped data are given in detail in Table 17. The scores used are 
those that appear for Class A. The mean of the scores for Class A 
is 86.00. Thus, a score of 89 deviates from this mean by 3 points. 
A score of 96 deviates ro points. Other deviations are similarly shown 
in the d column of Table 17. The standard deviation (о) is the square 
root of the mean of the squares of these deviations from the mean of 
the scores. It is necessary, therefore, to square each of these devi- 
ations. These squares are given under the column headed d?. Since 
each deviation appears only once and the data are ungrouped, the 
Sd? 


formula may be simplified to read o — aH The sum of the 


deviations squared (Xd?) in this case is 6100. The mean of these 
squared deviations is therefore 406.67. "Therefore, to turn it into units 
of the scale, the square root of this quantity must be taken. This 
value is 20.17, which is the standard deviation (с) of this series of 
scores. The mean of this distribution is 86.00 and the o is 20.17. 
Hence, approximately two-thirds of the scores will be found between 
Scores 20.17 points larger and 20.17 points smaller than this mean. 
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TABLE 17. Computation of the standard deviation for ungrouped data 


Deviations 
Test Deviations Squared Computations 
Scores (d) (4?) 
122 +36 1296 zd 
116 +30 goo QR ЧЕЧ 
108 +22 484 
тот +15 225 6100 
96 +10 100 = 15 
92 +6 36 
89 T3 9 = V 406.67 
86 o o 
83 быш 9 = 20.17 
80 — 6 36 
76 LIO тоо 3 1200 
71 —1$ 225 А.М.= ЖОШ 
64 -—22 484 5 
56 — 30 goo = 86.00 
50 50 1296 
1290 Уф? = 6100 


Computing the standard deviation of grouped data _ 


'The method of computing the standard deviation from ungrouped 
data illustrated in Table 17 may be applied with few changes to the 
calculation of sigma from grouped data. A’ slight change in the gen- 
eral formula is required, for when the scores are grouped in class inter- 
vals the deviations of the scores must be considered by groups having 
the midpoints of the intervals in which they are found. This permits 
the expression of the deviations in class intervals rather than in units 
of the scale. The formula for use in calculating the standard 
deviation when the data are grouped in a frequency distribution is 


єз. А P e. 


'The steps in the application of this formula in the calculation of 
the standard deviation of the scores originally presented in Table 9 
will make clear all of the processes involved. The computations them- 
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selves are shown in Table 18. The first five steps of procedure for 
computing the standard deviation are identical with those given 
above for determining the arithmetic mean. 

(1) Assume a value for the mean. Assume a mean as near as pos- 
sible to the arithmetic mean of the distribution in order that the 
correction (c) may be as small as possible. In Table 18, as for the 
computation of the А.М. for the same distribution, the assumed 
mean is taken as the midpoint of the interval 50-52. 


TABLE 18. Computation of the standard deviation for the grouped 
frequency distribution of 37 reading test scores 


= 


Assume a value for the mean (51.00) 
Lay off the deviations from the as- 
sumed mean by intervals 
Obtain the correction to the assumed 
mean Zfd = —25;Z[d|N = —25/37 
= —.676 (c) 
Square the correction to the assumed 
mean с? = —.676 = .457 (c?) 
Add the fd? column to the table 

- Obtain the squared deviations of the 
scores (fd?) 

- Obtain the sum of the squared devi- 
ations of the scores Efd? = 503 

. Obtain the mean of the squared devi- 
ations of the scores Х/0° /N = 503 /37 
7 13.595 

. Obtain the corrected mean of the 
squared deviations of the scores 
уй _, 
—N — = 13.595 — 457 = 13.138 

. Obtain the standard deviation (Class 
intervals) 


2 a 
ү -č = A/ 13.138 = 3.62 


7b. Obtain the standard deviation 
(Scale units) 


^ 


LEER 


1 
e 


^ 


I 7 
2 2 
I 5 
т 4 
2 6 
2 4 
3 3 
4 o 
6 — 6 
5 o 
3 9 
2 8 
I 5 
2 2 
I 7 
o o 
I 9 


w 
к] 


то.86 (S.D.) 


(2) Lay off the deviations from the assumed mean by intervals. 
Lay off deviations above and below the interval in which the mean 
is assumed to lie. Signs of deviations below the assumed mean are 
negative. 
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.(3) Obtain the correction to the assumed mean. Multiply the fre- 
quency in each interval by its corresponding deviation and carry the 
results to the fd column. Take account of signs. Obtain the sum of 
the positive fd and the negative fd values separately and then deter- 
mine the algebraic sum of the entire column. Take proper account 
of the signs of the two values first obtained. In the illustration of 
Table 18, the S/d is —25. Then obtain Xfd/N to determine the 
correction (c). For the distribution of the table, the correction is 
—25/37 or —.676.* In this computation the correction is left in terms 
of class intervals and is not, as is the case in computing the arithmetic 
mean, converted into scale units. 

(4) Square the correction to the assumed mean. Square the cor- 
rection in class-interval units * to obtain the second term under the 
radical sign (с?) in the formula given above for the standard devi- 
ation. For the illustration this becomes —.676°, or .457. 

(s) Add a column at the right of the table and label it fd”. This 
column is used for recording the squares of the deviations of the 
scores from the assumed mean. 

(6) Obtain the mean of the squared deviations of the scores. 
Obtain this mean by use of the following steps of procedure: 

(a) Obtain the squared deviations of the scores. Multiply 
each value in the fd column by its corresponding d value and place 
the results in the fd? column. All signs will be positive. 

(b) Obtain the sum of the squared deviations of the scores. 
Add the values in the fd? column to obtain = 34°, or the sum of the 
squared deviations of the scores from the assumed mean. The >fd* 
of Table 18 is 503. 

(c) Obtain the mean of the squared deviations of the scores. 
Divide Xfd? by № to obtain the mean of the squared deviations of 
the scores from the assumed mean, or Efd*/N. This value for the 


data of Table 18 is 503/37, ОГ 13.595: 
(d) Obtain the corrected mean of the squared. deviations of 


* Computations for the standard deviation should uniformly be carried to four 
decimal places and rounded back to three decimal places, otherwise using the same 
procedure for rounding numbers as was illustrated on page 321. The standard 
deviation itself is commonly stated to two decimal places only, however. 

5'The student will find the tables of squares and square roots such as are given 
in Lindquist, А First Course in Statistics, Revised edition (Houghton Mifflin Co., 
Boston, 1942), pages 230-39, very helpful in his work from this point on. Use of 
such tables, a slide rule, or an electrical calculating machine should speed up his 


work and result in a high degree of accuracy. 
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the scores. Subtract the square of the correction (c?) from Xfd*/N 
to obtain the corrected mean of the squared deviations of the scores 
from the arithmetic mean, i.e., to account for the deviation of the 
assumed mean from the arithmetic mean.? The result for the ac- 
companying illustration is 13.595 — .457, or 13.138. 
(7) Obtain the standard deviation. Complete the process of finding 
the standard deviation by using the following two steps of procedure: 
(a) In terms of class intervals. To obtain the standard devi- 
ation in units of class interval, extract the square root of the mean 
of the squared deviations of the scores from the arithmetic mean, 


; : |S fd? i 
i.e., obtain э — c*. The square root of 13.138 in the accom- 


panying illustration is, to two decimal places, 3.62. 

(b) In terms of the scale distance. To put the standard 
deviation into scale units, multiply the square root of the mean of 
the squared deviations of the scores from the arithmetic mean by 


the size of the class interval, i.e., obtain c.i. = _ c?. For the 


data of Table 18, this becomes 3 X 3.62, or 10.86. 

This means that approximately 68 per cent of the scores will be 
found between ordinates erected at a distance of ro.86 score units 
on either side of the mean. Of course this will not be strictly true, 
Since no distribution having as few scores as 37 is likely to ap- 
proximate the normal curve very closely. However, approximately 
two-thirds of the scores may be expected to lie between the two 
points one standard deviation above and one standard deviation 
below the arithmetic mean. For this illustration the points are 
48.96 + 10.86 and 48.96 — 10.86, or 59.82 and 38.10. Actually 25 
scores, or 67.6 per cent of the total, lie between these two points. 


Summary of steps in computing the standard 
deviation of grouped data 


The steps of procedure for computing the standard deviation from 
a grouped frequency distribution are as follows: 


$ The square of the correction (c2) is always subtracted from Xfd?/N because 
the latter value is always too large in instances where the assumed mean and arith- 
metic mean are not identical. If c has any value other than zero, the assumed mean 
and the arithmetic mean do not coincide and the deviations computed about the 
assumed mean are too large. The square of the correction (c2) must be subtracted 
from 2/02 /№ to compensate for this difference. 


(7) 
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Assume a value for the mean. Follow the procedure of step (т), 

page 322, for computing the arithmetic mean. 

Lay off the deviations from the assumed mean by intervals. Follow 

the procedure of step (2), page 322, for computing the arithmetic 

mean. 

Obtain the correction to the assumed mean. Follow the procedures 

of steps (4a), (4b), and (4c), page 322, for computing the arith- 

metic mean. As the correction (c) is to be stated in class-interval 
rather than in scale units, do not include step (4d). 

Square the correction to the assumed mean. Square the result of 

step (3) to obtain с?. 

Add a column at the right of the table and label it fd’. 

Obtain the mean of the squared deviations of the scores. 

(a) Obtain the squared deviations of the scores. Multiply each fd 
value by its corresponding d value and write the products in 
the fd? column. 

(b) Obtain the sum of the squared deviations of the scores. Add 
the values in the fd? column to obtain Х/0?. 

(c) Obtain the mean of the squared deviations of the scores. Divide 

the result of step (6b) by the number of cases to obtain 

Zfd*/N. 

Obtain the corrected mean of the squared deviations of the 

scores. Subtract the result of step (4) from the result of step 


хјаз 
N 


(d 


— 


— c. 


(6c) to obtain 


Obtain the standard deviation: 

(a) In terms of class intervals. Extract the square root of the 

2 fd? 3 

XC C 

(b) In terms of the scale distance. Multiply the result of step 
(7a) by the size of the class interval to obtain the standard 


result of step (6d) to obtain 4 


H c 3 Ха? 
deviation in scale units. S.D. = c.t. s m ce. 


Exercises in Computing the Standard Deviation 


II. 


12. 


Compute the standard deviation of the 40 arithmetic test scores 
tabulated in a frequency distribution for Problem r, page 317. 
(S.D. — 9.63) 
Compute the standard deviation of the 3o language test scores 
tabulated in a frequency distribution for Problem 2, page 317. 
(S.D. — 18.00) 
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Interpreting the Results 
of Measurement 


Tuis cHaprer gives consideration to the following points concerning 
the interpretation of results from measurement: 


A. Meaning of test scores. 
. Formal and informal types of derived scores. 
c. Graphical representations—frequency polygons and histo- 
grams. 
». Cumulative frequency graphs in estimating percentiles and 
percentile ranks. 
к. Norms—age, grade, and percentile types. 


The major techniques used in summarizing and describing single 
sets of test scores were presented in Chapter 12. Various techniques 
and devices for attributing additional meaning to test scores and 
distributions of test results are necessary, however, if optimum use 
of such results is to be attained. This chapter deals with the basic 
procedures and devices commonly used in attributing readily under- 
standable meaning to the results of measurement. 


] TEST SCORES 


The problems of summarizing test scores and of interpreting the 
results revealed by these summaries are very closely related and, 
were it not for the length and detail of the discussion they require, 
would probably be considered in a single chapter. The preceding 
chapter deals with three of the six major problems of statistical pro- 
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cedure as related to test analysis. This chapter concerns itself almost 
entirely with two more of these problems of test interpretation. The 
last problem will be treated in Chapter 14. 


Meaning of a test score 


Test scores are valuable to the classroom teacher to the extent that 
they can be interpreted. It is therefore important to define clearly 
what is meant by a test score. To do this, two or three new concepts 
require explanation. In the first place, a test score is a numerical 
expression of performance on the part of an individual. Sometimes 
the test score is merely the number of items answered correctly. 
Again it may be an arbitrarily defined scale value. But whatever its 
form, its function is to reveal in a quantitative way the performance 
of an individual as he responds to stimuli given under certain con- 
ditions. 

This leads to the second concept involved in the meaning of a score. 
The test score is an evidence of performance. Performance, the 
response of the individual to the test situation, is taken to mean in 
educational measurements the expression of ability operating under 
certain conditions. Performance may be thought of as Ability + 
Conditions. Scores on tests are definitely influenced by conditions. 
The pupil may make a low score because he does not have the ability 
to do better. On the other hand, he may make a low score because of 
illness, discomfort, poor illumination, a broken pencil, indifference 
for the subject, dislike for the teacher or examiner, failure to give 
attention to and to comprehend the directions, or any one of many 
other reasons. Accordingly, there is the possibility and even the like- 
lihood of a serious error in the assumption that a test score is direct 
evidence of ability, The conditions under which the performance 
takes place must be known before it is safe to infer ability from 
performance. 

Ability, as an abstract concept, may be defined as the power to do. 
Power to do, to respond to stimuli and to situations, is the product 
of training and experience. Ability may be thought of as Capacity + 
Training, which suggests that unless training and native capacity 
factors are known inferences about abilities may be misleading. This 
point becomes particularly serious in the interpretation of intelli- 
gence test results, for it is a common practice for users of such tests 
to infer capacity (mental ability) from performance scores. The real 
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seriousness of this type of uncritical inference may be seen by 
comparing the interpretations of an achievement test score and an 
intelligence test score. Both are basically expressions of performance. 
Equal abilities may be inferred from equal scores in both types of 
tests if and when the conditions under which they are given are all 
definitely under control. While it is difficult to make sure that all 
physical and psychological factors are adequately controlled in a 
testing situation, it is possible to regulate most of the mechanical 
conditions within reasonable limits. The significant point to note 
here, however, is the fact that users of achievement tests stop with 
an inference of equality of ability from equal performance scores, 
but users of intelligence tests are obliged to make a further inference. 

In the interpretation of intelligence test results, it is common prac- 
tice to infer equal capacity from apparent evidences of equal abilities. 
The fallacies in this argument and the dangers of this step must be 
readily apparent. Equal capacities may be inferred from performance 
scores only when there is direct and positive evidence of two things: 
first, that the conditions under which the testing took place were 
identical and equally well controlled ; second, that the training op- 
portunities of the individuals compared have been equal. The me- 
chanics of testing now make it fairly easy to control testing 
conditions. The second factor represents a real stumbling block in 
the way of an accurate and sane interpretation of intelligence test 
results. The naive manner in which some makers and many users of 
such tests assume equality of learning opportunity, and hence equal 
capacity from equal performance scores, is one of the things that 
has made many teachers and students skeptical of their value. 

The foregoing discussion of the meaning of a test score may 
appear to indicate that it is impossible to give meaning to any kind 
of test score. Such is not the intention, even though the purpose here 
is to emphasize the need for a conservative attitude in test score 
interpretation. In the long run, the more that is known about the 
variables underlying test scores, the more critical must the user be- 
come. The greatest damage that has been done to the field of educa- 
tional measurements has come as а direct result of carelessness and 
ignorance on the part of users of tests, and their tendency to draw 
unwarranted conclusions from the results. The teacher should be able 
critically to select suitable tests and scales for classroom use, to 
control the mechanical conditions of their administration, and to 
draw sane and defensible conclusions and inferences from the results. 


342 THE SECONDARY SCHOOL 


Giving meaning to test scores 


The user of educational tests in the classroom is confronted with 
two types of test data for interpretation. The first type, and un- 
doubtedly the more common of the two, deals with the results of 
informal, teacher-made tests. The results from these classroom 
tests are in turn of two types—the subjective scores assigned by 
teachers to pupils’ responses to essay tests and the scores resulting 
from informal objective examinations, While something can be done 
to improve the interpretation of the relatively unreliable marks 
assigned to the discussion-type exercise, much more is possible in 
the accurate interpretation of the scores resulting from the use of 
objective examinations. Since one of the major functions of the 
standardization of a test is the establishment of meaning for the test 
scores, many additional types of interpretation are thus made pos- 
sible in the use of the second type of test data, which is obtained 
from the use of standardized tests. 


2 FORMAL TYPES OF DERIVED SCORES 


Function of derived scores 


Test scores used to describe the performance of pupils are ex- 
pressed in a variety of different units and in relation to a variety of 
different scales of measurement, In some tests the unit of measure- 
ment used may be relatively large. Pupil scores expressed in terms 
of these large units are often numerically small. A long test com- 
posed of many items may result in numerically large scores. It is 
thus apparent that a given score on one test may represent an 
exceptionally good performance while the same score on another 
test may represent an exceedingly poor performance. Some common 
basis must be established if comparisons of scores based on these 
widely different types of scales are to be possible. A number of 
methods have been developed for the calculation of derived scores 
which will partially take care of this difficulty. 


Relation of derived scores to norms 


Confusion may easily exist in the thinking of the student concern- 
ing the distinction between derived scores and norms. As a matter of 
fact, tables of norms often yield such derived scores as grade scores, 
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age scores, or percentile ranks. The use of norm tables for obtaining 
such derived scores directly from raw scores or point scores on 
various tests is illustrated in Chapter 5 and is also discussed later in 
this chapter. Such ratios as the intelligence quotient, educational 
quotient, accomplishment quotient, and reading quotient are derived 
scores, but they are obtained by a division of one value by another. 
Some of these quotients are presented later in this chapter and 
others are treated in Chapter то. 

Another possible source of confusion to the student lies in the fact 
that some tests provide a two-step procedure from raw scores to 
norms. In those cases, such derived scores as standard scores, scaled 
scores, converted scores, and equated scores are obtained from raw 
scores. Such derived scores have more meaning than do raw scores, 
but they do not always have final meaning for the interpretation of 
test results. Consequently, it is often necessary to enter a table of 
norms with the derived scores and to interpret them in terms of such 
other derived scores as grade scores, age scores, or percentile ranks. 
Situations of this type will also be discussed in a later section of 
this chapter. 

It is believed that the most satisfactory method of familiarizing 
the student with derived scores and norms is first to present the 
various types of derived scores, methods of computing them, and 
something of their meaning, and then to illustrate the need for and 
use of norms in the further interpretations required to make some 
of them meaningful. In the treatment that follows, three types of 
derived scores are distinguished: (1) those based on average or 
median performance, (2) quotients and related measures, and (3) 


those based on variability of performance. 


Derived scores based on average performance 


The two types of derived scores based on average or median per- 
formance are the grade score and the age score. These are directly 
dependent upon tables of norms, for it is only by entering norm 
tables with raw scores or some other forms of scores that grade 
equivalents or age equivalents can be determined. The meaning of 
grade scores and age scores is presented here and the use of tables 
of norms for their derivation is illustrated later in this chapter. 

Grade equivalents. A grade equivalent indicates the position on a 
grade scale at which a pupil’s test performance places him. For 
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example, a child may attain a score on a reading test that is identical 
with the average or median score of pupils three months into the 
fourth grade. If so, his grade equivalent on the subject matter of 
the test is 4.3, regardless of whether he may be in the fourth grade 
or in some grade above or below the fourth. Grade scores are some- 
times referred to as G-scores or, less frequently, as -B-scores. Grade 
and months are commonly listed as a number and decimal respec- 
tively or as a number and exponent respectively. Thus the above 
grade equivalent might be stated either as 4.3 or as 4°. 

Age equivalents. In the manner very similar to that which operates 
for grade scores, age equivalents indicate the position on an age 
scale at which a pupil's test performance places him. The hypotheti- 
cal child whose reading test score gave him a grade equivalent of 4.3, 
for example, might be found by the use of a table of age norms to 
have an age equivalent of nine years eight months (9-8) on the same 
test. This would mean that his score was identical with the score 
made by the average or median pupil nine years and eight months 
of age. He might actually be a year or so older or younger ; his age 
equivalent on the subject matter of the test would nevertheless be 
9-8. Age equivalents are represented by such terms as educational 
age (EA) for achievement over broad areas of subject matter, 
mental age (МА) for performance on general intelligence tests, and 
reading age (RA) for achievement in reading. Such ages are com- 
monly stated in hyphenated form, the first number indicating years 
and the second number months of age. Thus the EA of 9-8 indicates 
that in broad educational achievement the child used in the above 
illustration is at the same level as average children nine years and 
eight months of age. 

Although this book is most directly concerned with the types of 
age equivalents noted above, the same technique is applied to the 
measurement of other aspects of child growth and performance. For 
example, anatomical age, physiological age, and social age are com- 
parable terms that are employed with varying degrees of exactness in 
meaning. Chronological or life age is, of course, the most widely used 
of all, and is frequently employed as the basic or criterion measure 
of test validity, as will be pointed out in the following paragraphs. 
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Quotients as derived scores 


Quotients and other similar derived scores show the relationship 
existing between two characteristics for the child as a means of 
indicating the manner in which growth of various types is related. 
For instance, the educational quotient, intelligence quotient, and 
reading quotient are ratios respectively between a child's educational 
or mental and chronological ages. The accomplishment quotient is 
the ratio between a child's educational and mental ages. The first 
three are based on the idea that on the average a child grows in all 
ways more nearly in conformance with his chronological age than 
with any other measures, and also upon the recognition that devia- 
tions from that pattern of growth result from individual differences 
and are meaningful in the guidance of the child. The accomplish- 
ment or achievement quotient is based on the idea that the child's 
mental age is perhaps a better criterion by which to judge his 
educational growth than is his chronological age. All of these have 
been discussed in appropriate chapters elsewhere in this volume. 

Computation of the various quotients listed above will be illus- 
trated for a pupil who has a chronological age (СА) of 8-4, an educa- 
tional age (EA) of 9-2, a mental age (MA) of 9-7, and a reading age 
(RA) of 9-4. The last three ages would be determined in the manner 
indicated in the above section from his scores on a general achieve- 
ment, a general intelligence, and a reading test. The quotients are 
all based on computations in which each age is reduced to months, 
and all ratios are multiplied by roo to eliminate the use of decimals 
in the results. . 

For the child whose various age levels or age equivalents are given 
above, his educational quotient (EQ) would be 


Ks ЕРИ Ого, > тто (months) __ 
EQ = 100 TAT mer. = ес months) ^ 1IO, 
his intelligence quotient (IQ) would be 
x МА __ 9-7 _ irs (months) __ 
IQ = 100 Tre sedo EXPE S ants) к. IIS, 
his reading quotient (RQ) would be 
RA 112 (months) __ Yes, 


RQ = 10025 = 100 $4 = 100 


CA тоо (months) - 
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and his accomplishment quotient (AQ) would be 


EQ IIO 
= —= —— == 95. 6 
AQ = тоо 70 Too 95.6 or 96, 


or 


лоду С 95.6 ог 96. 


EA 
AQ 100 JA ede. ore 


These quotients indicate that the child is well above average for 
his age in intelligence and is somewhat less accelerated educationally. 
Within the limits of reliability for the AQ, discussed in some detail 
in Chapter 10, it appears that his achievement is not quite what 
might be expected of a child of his mental ability level. His reading 
quotient indicates a somewhat greater advancement in that subject 
than for the average of all other areas of achievement covered by the 
general achievement test from which his EA was determined. 

With this brief presentation of the method of deriving the various 
commonly used quotients as a background, the student should be 
able to interpret these quotients adequately when he encounters 
them elsewhere in this volume. It should be understood that the КО 
is merely representative of quotients that can be derived for the 
various subjects of the curriculum if age norms are given for such 
subjects on the standardized tests used. In practice, such quotients 
are seldom used except for reading and arithmetic, however. 


Derived scores based on variability of performance 


Derived scores based on variability of performance are of two 
types: (1) percentile ranks, and (2) scores that express position on 
a scale in units of the standard deviation or quartile deviation. Al- 
though these methods are similar in some respects, they differ in 
several fundamental ways which determine their effectiveness for 
certain types of uses. 

Percentile ranks are less reliable than are derived scores based on 
the standard deviation because they are more affected by minor 
irregularities in the distribution of scores upon which they are based 
than is the standard deviation. Percentile ranks cannot with strict 
validity be averaged, whereas averaging several scores similarly 
stated in terms of the S.D. is a defensible procedure. Measures of 
rank are based on equivalent areas under the curve, so that per- 
centile ranks near the middle of the distribution, for example of 


-— 
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48 and 49, usually represent closely similar raw scores, whereas 
percentile ranks near an extreme of the distribution, say of 2 and 3; 
may well represent raw scores differing by a number of points. On the 
other hand, derived scores based on the S.D. differ by equivalent 
distances along the scale, so that they represent merely the applica- 
tion of a new and more meaningful linear scale to a linear distance 
originally represented in raw-score units. 

Percentile ranks. The test performance of a pupil may be expressed 
in terms of his position in the distribution of scores for pupils in his 
school grade, in a certain course he is taking, such as plane geometry, 
or who, with him, have studied a certain subject, such as a foreign 
language, for a given number of semesters. This is accomplished by 
dividing the distribution so that the various divisions contain the 
same percentage of the total number of cases. Various plans are to 
divide the distribution into fourths, fifths, tenths, or hundredths. The 
distribution is divided into fourths by computing the quartiles (Q;, 
median, and Q;); into fifths by computing the quintiles, i.e., per- 
centiles 20, 40, 60, and 80; into tenths by computing the deciles, i.e., 
percentiles ro, 20, 30,... 70, 80, and go; and into hundredths (per- 
centile ranks) by computing every percentile from 1 to 99 inclusive. 
It should be clear that pupils who rank in the second quarter from the 
top of a distribution have scores between О; and the median and that 
those who rank in the middle fifth have scores between the fortieth 
and sixtieth percentiles. Similarly, pupils who have a percentile rank 
of, say, 37, have scores that lie between the thirty-seventh and thirty- 
eighth percentiles. It is apparent from the foregoing that a percentile 
is a point on the scale and that a percentile rank represents an area 
lying between two adjacent percentiles. 

The teacher wishing to interpret test results is usually far more 
interested in percentile ranks than in percentiles, although there are 
occasions when he may wish to compute certain percentiles. The 
computation of percentiles will be treated briefly here, and a graphi- 
cal method of value in the estimation of percentile ranks will be pre- 
sented later in this chapter. 

Percentiles are computed in the same manner as was illustrated in 
Table 15, page 325, for Q is the twenty-fifth percentile and Q; is the 
seventy-fifth percentile. The median, for which computation pro- 
cedures are shown in Table r5, is also a percentile—the fiftieth. The 
same 37 reading test scores frequently employed above are again 
used in the frequency distribution of Table то. The cumulative 
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frequency column of the table is included as an aid in the computa- 
tion of percentiles. Although percentiles near the top of a distribution 
are sometimes more easily computed by counting down from the top 
than by counting up from the bottom, the latter procedure will be 
employed in the illustrations here because of the manner in which 
the cumulative frequency column facilitates the computations. 


TABLE 19. Computation of deciles and percentiles for the grouped 
frequency distribution of 37 reading test scores 


I 
2 36 2 cases in and below c.i. 29-31 
I 34 37—25 1.7 
I 33 1.7 2 = 85 
2 32 85 X 3 = 2.55 
2 30 31.50 + 2.55 = 34.05 (Pw or Di) 
3 28 2. -L N = 9 X 37 = 333 
4 25 To А Е 
6 21 33 cases in and below ci. 62-64 
33.3 — 33 = 3 
5 I5 == 33 
3 10 33 X 3 = .99 
2 1 64.50 + .99 = 65.49 (Po or Do) 
1 3. 63 
= № = .6 225 
2 5 Tom 3 Х 37 = 23.31 
I 2 21 cases in and below c.i. 47-49 
o т 23.31 — 21 = 2.31 
I I 2.31 3- 4 — .58 
7 58 X 3 = 1.74 


о» 


49.50 + 1.74 = 51.24 (Pes) 


Computations of three percentiles are illustrated in the table, but 
they will not be developed in detail here because they present nothing 
new to the student in the way of computational difficulties. Shown 
first in the table is the computation of the tenth percentile (Ру), 
which is also known as the first decile (D,). The ninetieth percentile 
(Po), or ninth decile (Dy), is shown in second position. These rep- 
resent the procedures in computing deciles. The last illustration, to 
show procedures for percentiles in general, is for the sixty-third 
percentile (Pes). 

Derived scores based on the standard deviation. A considerable 
number of derived scores have the standard deviation and arithmetic 
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mean of a standard group of pupils as basic to their derivation. These 
various derived scores have different names, and some of them aré 
devised for use with particular tests or series of tests. Although they 
differ widely in the manner in which the standard groups upon which 
they are based are selected, and make use of different numerical 
methods of representation; they Have the elemerit in common of being 
based upon the standard deviation. : 

Methods of computing the arithmetic mean and the standard 
deviation are presented in Chapter 12. One of their major uses is 
that of providing one of the most satisfactory means of deriving 
meaningful scores from test results. The brief treatment of derived 
scores here shows the major types of such scores and the elements 
of similarity and.difference among them. 

Standard measures or z-scores are mentioned briefly here because 
they represent such a simple method of showing deviation of a score 
from the arithmetic mean of the distribution and because of their 
similarity to other derived scores. However, the z-score is a measure 
used primarily in statistical procedures and has very little direct 
significance for the interpretation of test results to the teacher. The 
z-score is found by the application of the formula 


X—M 


eee RTI 


in which X is a particular raw score, M is the arithmetic mean of the 
distribution of raw scores, and S.D. is the standard deviation of the 
distribution of raw scores. It is sufficient here to point out that the 
z-score expresses deviation from the arithmetic mean in terms of 
standard deviation units and to give a few illustrations. For ex- 
ample, a z-score of +-2.00 is two sigmas above the mean, a z-score 
of —2.00 is two sigmas below the mean, and a z-score of —.37 15.37 
S.D. below the mean. Therefore deviations from the mean can be read 
directly from z-scores. 

T-scores are similar to z-scores, except that they eliminate the'use 
of negative values and decimals. A T-score of. 50 was’ afbitrarily 
decided upon to represent a score at the arithmetic mean of a distri- 
bution and 10 T-score units were made equivalent to one standard 
deviation of distance. The formula for the T-score is 
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where X, M, and S.D. have exactly the same significance as they had 
in computing z-scores, that is, a particular raw score, the arithmetic 
mean, and the standard deviation. A score two sigmas above the 
mean has a T-score value of 70, a score two sigmas below the mean 
has a T-score value of 30, and a score .37 S.D. below the mean has a 
T-score equivalent of 46. Fractional values are not ordinarily used 
in T-scores. 

Standard scores, scaled scores, equated scores, and converted scores 
are other types of derived scores that provide for comparability of 
Scores on different parts of the same test or even on different tests. 
This is accomplished by changing raw scores to derived scores by 
methods differing somewhat from those described above but never- 
theless based on the mean and standard deviation for some standard 
group. 


Other types of derived scores 


Although the types of derived scores discussed here are those most 
commonly used, several miscellaneous types that do not fit into any 
of the categories above merit brief mention here. 

In the field of intelligence testing, the personal constant (PC) and 
the index of brightness (/B) are not mentioned above, but they are 
given adequate treatment in Chapter ro. Personality inventories in 
a few instances make use of the personality quotient (PQ), which is 
treated sufficiently in Chapter тт. A derived score that relates in- 
telligence and achievement—the index of studiousness—is given 
attention in Chapter то. 

. The derived scores discussed in this chapter and elsewhere in the 
volume probably do not include all of the types or variations of such 
measures, for it is not uncommon to find a new test appearing with 
a new type of derived score. However, the types presented are the 
most widely used and the most important at the present time. 


3 INFORMAL TYPES OF DERIVED SCORES ~~ 


:. Derived scores of the types discussed above are used variously in 
interpreting results from standardized tests, and particularly for 
standardized educational and intelligence tests. Some of these same 
derived scores can be, and in fact often are, used in interpreting 
results from teacher-made or classroom tests. This applies particu- 
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larly to percentile ranks but may also apply to T-scores. Three other 
methods of establishing comparability of results from teacher-made 
tests are discussed below: (т) relative ranks, (2) letter marks on a 
single test or measure, and (3) letter marks representing composite 
achievement covering a marking period, semester, or school year. 


Relative ranks 


In working with test scores it often becomes desirable to make the 
achievement of all pupils in the group the basis for comparison. The 
relative performances of pupils may be compared by the simple 
process of assigning ranks or positions to their scores in accordance 
with the magnitudes of the scores. Thus for a group of twelve pupils 
who took a certain arithmetic test and received twelve different 
scores, ranks of т, 2, 3, . . . 10, 11, and 12 are assigned in descending 
order of test scores. 

The only difficulty appears when two or more pupils have identical 
scores. The illustration of Table 20 will make clear the method of 
handling this situation. Pupils B and C, with scores of 44, are 
assigned the average of the two rank positions—2 and 3—for which 
they are tied (2-3-2). Likewise, pupils F, G, and H, all with scores 
of 38, receive the rank of 7 (64+7-+8+3). 


TABLE 20. Assignment of relative ranks to arithmetic test scores 


It should be clear that the assignment of relative ranks or positions 
to pupils having certain test scores actually covers up the true 
situation to some degree. For example, the difference in magnitude 
of scores is submerged by relative ranks. To illustrate, in the data of 
Table 20 a difference of six score points on the test (31 to 37) makes 
a difference of only one rank position (9 to то) in one situation. But 
a difference of only one score point (28 to 29) also makes a difference 
of one rank position elsewhere. This is a point that should be kept 
in mind when using the method of relative ranks. Ranking shows 
that one pupil is above or below another pupil but fails to indicate 
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how much in terms of actual score differences he is above or below 
the other pupil. 

The usefulness of this method is also somewhat limited by the fact 
that it takes no account of the actual level at which the accomplish- 
ment takes place. A person ranking 32 in a group of 35 has a very 
low relative rank in the group. However, if he ranked 32 in a group 
of 250, the significance of his accomplishment would be greatly 
changed. Percentile ranks, however, take this point into account by 
reducing the ranking to a basis of roo units. A percentile rank of 75 
means that for the measures under consideration the individual made 
a higher score than.75 per cent of the individuals in his group without 
regard to the number of cases it contains. 


Letter marks on a test 


One of the major uses of the standard deviation, as a measure basic 
to certain important types of derived scores, is treated in a preceding 
section of this chapter. The standard deviation has many other uses 
not directly related to formal derived scores, however, and one of 
them is important enough to justify attention here. 

The student or teacher who is interested in the critical analysis of 
test scores will find the standard deviation a very useful measure in 
assigning letter marks to classroom test scores. The importance of 
this practice is sufficiently great that the steps involved in the 
technique are given in some detail. The computations described are 
based upon the 37 reading test scores used in the preceding compu- 
tational illustrations and originally listed in Table 9, page 310. The 
steps of procedure used in assigning.A, B, C, D, and F marks to the 
37 pupils are outlined below by the-use of the arithmetic mean and 
standard deviation of the scores. 


1. Obtain the arithmetic mean and standard deviation. The arithmetic 
mean, shown in Table 14, page 320, is 48.96, and the standard devi- 
ation, shown in Table 18, page 334, is 10.86. 

2. Mark off distances of .5 S.D. above and below the mean. The first 
of the points is 48.96 + (10.86 — 2), ог 54.39. The other point is 
48.96 — (10.86 + 2), or 43.53. These two points are respectively 
at the upper and lower ends of the C mark range. 

3. Mark offidistances.of 1.5 S.D. above and: below-the mean. The first 
of these ‘points is 48.96 + (10.86 X 1.5); Or 65.25. The other is 
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48.96 — (10.86 X 1.5), or 32.67. These two points separate the 
А and B and the D and F marks respectively. 

4. Establish the score limits of the letter marks. From-these values set 
up a table showing the score limits.of each mark. It should be noted 
that no upper limit for the A mark and no lower limit for the F mark 


are set. 
Mark Score Limits 
A 65.25 and above 
B 54.39 to 65.25 
C 43:53 to 54.39 
D 32.67 to 43.53 
F 32.67 and below 


Reference to Table 10, page 310, where the 37 original scores are 
listed in descending order, will disclose that 3 A, 8 B, 16 C, 7 D, and 
3 F marks are assigned by this method. The percentages of the 37 
marks at each level are: A, 8; B, 22; C, 43; D, 19; and F, 8. These 
are close tothe 38 per cent of Cs, 24 per cent each of Bs and Ds, and 
7 per cent each of As and Fs which have a tendency to result if the 
distribution contains a rather large number of cases and if it approxi- 
mates a normal distribution in form. 

It is readily apparent that practically no subjective factors are 
involved in the assignment of marks by this method. The score limits 
are determined by the standard deviation units and would be the 
same no matter who assigned the marks. It should be noted, however, 
that these limits hold only for this particular distribution and must 
not be assumed to be true for any other test. The teacher should also 
remember that this method of marking does not take into account 
the absolute level of ability at which a particular class works. The 
superior pupil in an average or poor class receives an A by this 
method just as readily as does the superior pupil in a very superior 
class, This is probably less serious than it sounds, however, for most 
class groups large enough to warrant the application of this technique 
average out quite well in this respect. 


Final letter marks 


Final marks summarizing all of the quiz and test scores, and even 
including all of the more subjective marks such as those on themes, 
term papers, and notebooks, can be obtained quite readily for the 
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work of.a marking period, semester, or school year. Various methods 
for accomplishing this objectively have been employed. One of the 
best and simplest procedures involves the use of A, B, C, D, and F 
marks for stating results from each factor that is to receive considera- 
tion in determining the final marks. For valid and quite reliable 
measures, such as scores on carefully constructed objective tests, plus 
and minus marks in each letter category can be applied by a simple 
extension of the method outlined immediately above. For example, 
high C marks would be rated C--, low C marks would be assigned 
C—, and the intervening marks would be designated as C. For less 
reliable ratings, such as marks on themes and term papers, the five- 
point scale consisting of A, B, C, D, and F is doubtless prefer- 
able. 

; It is possible not only to use more discrimination in weighting 
highly reliable and valid scores than is used for more subjective 
measures, as is illustrated above, but also to weight the various 
factors entering into the determination of final marks according to 
their estimated importance. This is accomplished by using a weight- 
ing of 1 for least significant results, a weighting of 2 for results of 
intermediate importance, and a weighting of 3 for measures judged 
to be most important. Higher weights can easily be obtained if 
desired by an extension of Table 21. 


TABLE 21. Suggested weightings for marks in obtaining composite 
scores 


Weighting of 
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A simple illustration employing only three measures will suffice 
for showing how this method is applied. If a certain pupil has a B— 
on a mid-semester test, a C on his term paper, and a final examination 
mark of C+, and if the three measures аге to be weighted 2, т, and 3 
respectively, his mid-semester test weighting is 20, his term paper 
weighting is 7, and his final examination weighting is 24. 'The sum 
of 20, 7, and 24, or 51, is his weighted composite score for total per- 
formance. When similar composites are obtained for the other pupils 
in the class, a distribution of the weighted composite scores can be 
made. It is then possible to assign final course marks by use of the 
method outlined above or by some modification of it. However, since 
no marking system should be rigidly defined, departures from any 
system of attaining objectivity should be made when conditions 
warrant. 


Exercises in Computing Derived Scores 


13. Assign T-scores to the raw scores of 55, 66, 44, 79, and 35 from the 
40 arithmetic test scores of Problem 1, using the values of the 
arithmetic mean and standard deviation found in Problem 4, page 
323, and Problem 11, page 337; respectively. 

14. Compute the 4oth and the 63rd percentiles for the 40 arithmetic 
test scores tabulated in a frequency distribution for Problem т, 
page 317. 

15. Assign relative ranks to the 30 spelling test scores of Problem 3, 
page 317. 

16. Determine limits for A, B, C, D, and F marks for the 4o arithmetic 
test scores of Problem r, using the values of the arithmetic mean 
and standard deviation found in Problem 4, page 323; and Problem 
11, page 337, respectively. 

17. Find the weighted composite score for a pupil who has C4-, B, C, 
and C- marks respectively on a mid-semester test, a semi-final 
test, a term paper, and a final examination when the first two tests 
are weighted one each, the term paper is weighted two, and the 
final examination is weighted three. 


Á GRAPHICAL REPRESENTATION 


Various methods of graphical representation have value in the 
interpretation of results from educational measurements. These range 
from simple graphs and charts to complex and involved represen- 
tations and to such popular techniques as the pictograph. Three types 
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of graphic representations seem to be most widely useful to the 
classroom teacher. Accordingly, only these three types will be 
treated here: (1) the frequency polygon, (2) the histogram, and; 
(3) the cumulative frequency graph. 


Frequency polygon 


The most widely used and the most easily constructed and com- 
prehended form of graphical representation is the frequency polygon. 
When a normally distributed trait has been measured for a large 
number of pupils and when the data are grouped into a rather large 
number of class intervals, say 25 to 30, the resulting frequency 


TABLE 22. Class frequencies, cumulative frequencies, and cumulative 
relative frequencies for 37 reading test scores 


I 
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I 
I 
2 
2 
3 
4 
6 
5 
3 
2 
I 
2 
I 
o 
I 
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polygon closely resembles in shape the smoothed normal curve 
pictured in Figure 28, page 331. A frequency polygon has major 
values in showing individual differences among the pupils in a certain 
class. Impressions concerning the range of scores, the shape of the 
distribution, and an approximation to a measure of the central 
tendency of the scores can readily be formed from this type of graph. 
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When certain points in the distribution, such as the median and 
quartiles, are designated with reference to the score scale, this type 
of graphical representation assumes much more meaning than does 
a frequency distribution of the same scores. 

Reproduced in Table 22 is the grouped frequency distribution of 
the 37 reading:test scores presented in their original form in Table 
9, page 31o, and presented in the form of a grouped frequency 
distribution in Table 13, page 314. Only the class-interval and 
frequency columns of this table are used in the construction of the 
frequency polygon; the right-hand columns will be used later in 
constructing a cumulative frequency graph for the same data. Figure 
29 depicts this score distribution in the form of a frequency polygon. 
Values of the Pio, Оу, median, (з, and Р,, are shown on the graph 
as an indication of how the polygon may be given additional 
meaning. 
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Midpoints of Class Intervals 


Fig. 29. Frequency polygon of 37 reading test scores 


'The steps of procedure for constructing a frequency polygon on 
squared paper as are follows: 


г. Rule left and bottom marginal lines. Rule two straight lines, per- 
pendicular to each other, to establish the left and bottom edges of 
the graph proper. 

2. Establish and indicate values on the score scale. Lay off at equal 
distances along the base line and on rulings of the squared paper the 
midpoints of class intervals, starting somewhat to the right of the 
vertical line with the midpoint of the interval next below the lowest 
in the distribution and continuing toward the right to the midpoint of 
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the interval next above the highest in the distribution. Designate 
the values of these midpoints immediately below the base line. 

3. Establish and indicate values on the frequency scale. Lay off on the 
left vertical line successive units to represent the frequencies of the 
different class intervals. Select a unit that will result in a height for 
the interval having the greatest frequency of perhaps two-thirds to 
three-fourths of the width of the figure. Designate values on the 
frequency scale immediately to the left of the vertical line. 

4. Complete the margins of the graph. Rule a horizontal line across the 
top of the graph somewhat above the highest value on the frequency 
scale and rule a vertical line from a point somewhat to the right of 
the highest midpoint to complete the graph. 

5. Establish frequencies on the vertical scale. At the midpoint of "inch 
interval along the base line, count up a distance along the frequency 
scale equal to the number of scores in the class interval and place a 
mark above the midpoint of the interval to indicate this height. 

6. Complete the frequency polygon. Join the marks obtained in the 
preceding step with straight lines to give the frequency surface, ex- 
tending to the base line at the midpoints of any intervals of zero 
frequency and at the midpoints of the intervals next below the lowest 
and next above the highest in the distribution. 


Histogram 


The histogram has values similar to those of the frequency poly- 
gon, although it differs from the polygon in general appearance and 
in construction. Two histograms cannot satisfactorily be superim- 
posed, and the histogram gives an impression of discontinuity of 
frequencies from interval to interval. It is the type of simple graph 
that closely resembles many of the pictographs in which silhouettes 
of figures are arranged in rows or columns of varying lengths to 
represent different frequencies. The frequency distribution of 37 
reading test scores, reproduced in Table 22, is used to illustrate the 
construction of the histogram shown in Figure 3o. Again the two 
right-hand columns of the table are not used in the procedure. 

Although the construction of a histogram is similar to that of a 
frequency polygon, several minor differences in the early stages and 
major differences in the late stages warrant a separate list for the 
steps of procedure. 


1. Rule left and bottom marginal lines. Follow the procedure given 
above in the first step for constructing a frequency: polygon. 
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2. Establish and indicate values on the score scale. Lay off at equal dis- 
tances along the base line and midway between rulings of the squared 
paper the midpoints of class intervals from left to right, starting 
somewhat to the right of the vertical line with the midpoint of the 
lowest interval and continuing to the midpoint of the highest inter- 
val. Designate the values of these midpoints immediately below the 
base line. 

3. Establish and indicate values on the frequency scale. Follow the 
procedure given above in the third step for constructing a frequency 
polygon. 

4. Complete the margins of the graph. Follow the procedure given above 
in the fourth step for constructing a frequency polygon. 

s. Establish frequencies on the vertical scale. Follow the procedure given 
above in the fifth step for constructing a frequency polygon. 

6. Complete the histogram. Rule horizontal lines from the lower to the 
upper limits of each class interval at the points where the marks of 
the last step of procedure were made. Complete the enclosure of the 
figure by ruling in the connecting vertical lines. Extend to the base 
line only at the upper and lower ends of the distribution and above 
and below class intervals having zero frequencies. 


7 
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Fig. 30. Histogram of 37 reading test scores 


Cumulative frequency graph 


'This form of graphical representation does not reveal the major 
characteristics of a score distribution as clearly as do the frequency 
polygon and histogram. Therefore, it is not suitable for use in pre- 
senting test results graphically to point up individual differences 
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and group characteristics. This characteristic of the cumulative 
frequency graph results from the fact implied in its name—it shows 
cumulative frequencies rather than frequencies by class intervals. 
In doing so, it fails to reveal the shape of the curve and minor 
irregularities in it as well as do the frequency polygon and histogram. 

The cumulative frequency graph has major significance in facilitat- 
ing the estimation of quartiles, quintiles, deciles, and percentiles. 
This characteristic also makes possible the estimation of percentile 
ranks for given test scores. As has been shown above, percentile 
ranks are obtained by dividing the distribution into one hundred 
equal parts with respect to the area under the curve, i.e., in terms 
of score frequencies, by use of percentiles 1 to roo inclusive. The 
37 reading test scores used in the illustration here cannot readily be 
divided into that number of equal parts. Hence, cumulative relative 
frequencies are used in constructing the figure so that the frequency 
scale will be divided into the one hundred equal parts necessary for 
ready estimations of percentiles and percentile ranks. The last two 
columns of Table 22 show the 37 reading test scores cumulated 
upward by class intervals and the cumulated relative frequencies of 
the scores. The cumulative relative frequency column shows the 
percentage of the 37 scores lying in and below each interval from 
the lowest, for which the fraction is 1/37, or 2.7 per cent, to the 
highest, for which the fraction of 37/37 is 100.0 per cent. 

The steps of procedure for constructing a cumulative frequency 
graph are given below. Figure ar illustrates the application of this 
method to the 37 reading test scores of Table 22. 


г. Rule left and bottom marginal lines. Follow the procedure given on 
page 357 in the first step for constructing a frequency polygon. 


2. Establish and indicate values on the score scale. Lay off at equal dis- 
tances along the base line and on the rulings of the squared paper 
the real limits of class intervals, starting with the lower real limit of 
the lowest interval and continuing to the right until the higher real 
limit of the highest interval is reached. Designate the values of these 
real limits immediately below the base line. 


3. Establish and indicate values on the relative frequency scale. Lay off 
on the left vertical line a scale in тоо parts to represent the cumu- 
lative relative frequencies. Select a unit such that the height of the 
graph will be roughly equal to or even somewhat greater than its 
width. Designate values on this vertical scale in multiples of ten 
or even of five. 


Cumulated Relative Frequencies. 
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Complete the margins of the graph. Rule a horizontal line starting at 
the roo point on the vertical scale and a vertical line starting with 
the higher real limit of the highest interval in the distribution to 
complete the margin of the graph. 


Establish cumulative relative frequencies on the vertical scale. Place 
a mark at the upper real limit of each class interval opposite the 
percentage on the vertical scale that shows the cumulative relative 
frequency in and below the interval. 


Complete the cumulative frequency graph. Join these points suc- 
cessively by straight lines, starting with the point where the left 
vertical and base lines meet and continuing to the right and upward 
until the point where the right vertical and top horizontal lines meet 
is reached. 


Real Limits of Class Intervals 


Fig. 31. Cumulative frequency graph of 37 reading test scores 
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It is suggested that the student use the cumulative frequency 
graph of Figure 31 to check the consistency of estimates based on it 
with results obtained above by computation for certain quartiles, 
deciles, and percentiles. A median of 48.24 was obtained for these 37 
reading test scores in Table 15, page 325. Values of 65.40, 34.05, and 
51.24 were given in Table то, page 348, for Dss, D10, and P;; respec- 
tively. If, for example, it is desired to check on the twenty-fifth 
percentile, or Q;, the procedure is to use a rule or straight edge 
horizontally in proceeding from the reading of 25 on the vertical 
scale to the curve itself and from that point on the curve to rule 
vertically downward with the straight edge to the score scale. The 
value on the score scale at this point should very closely approximate 
the 42.75 obtained computationally for Q;. 

The illustration above shows one of the uses of the cumulative 
frequency graph, i.e:, to estimate percentiles in a distribution of 
scores. As this method is not highly accurate unless the graph is 
drawn with great care and on quite a large dimensional scale, it is 
recommended that such important percentiles as the median, Оу, and 
О, at least be computed by the methods outlined above when their 
use is indicated and that this procedure be used in other instances 
when only approximate values of certain percentiles are desired. 

A more important and significant use of the cumulative frequency 
graph requires in effect a reversal of the procedure for estimating 
certain percentiles. This second use is in estimating the rank of a 
certain score or of certain scores in terms of fourths, fifths, tenths, or 
hundredths. Such estimates by hundredths are in terms of the percen- 
tile ranks discussed above. If, for example, it is desired to obtain 
the percentile rank of a score of 46, a rule or straight edge is applied 
vertically to the score value of 46 on the base line and is then, at the 
point where it crosses the curve, used horizontally to read the 
percentile equivalent on the left vertical scale. This percentile 
equivalent is approximately 38. "Therefore, the percentile rank of 
the score of 46 is 38. This discloses that the score of 46 is, reading 
downward in each instance, in the third quarter, in the fourth fifth, 
and in the eighth tenth of the distribution of scores. 

It should be apparent from the above that close approximations 
to desired percentiles and also to ranks of given scores in terms of 
hundredths (percentile ranks), tenths, fifths, and fourths can readily 
be obtained from a carefully constructed cumulative frequency 
graph. The degree of accuracy in the estimates obtained is sufficient 


INTERPRETING THE RESULTS OF MEASUREMENT 363 


for most practical purposes and the time and labor saved when a 
number of such values are to be computed is great. 


Exercises in Constructing Graphs 


18. Construct a frequency polygon on squared paper for the 40 arith- 
metic test scores tabulated in a frequency distribution for Problem 
1, page 317. 

19. Construct a histogram on squared paper for the 30 language test 
scores tabulated in a frequency distribution for Problem 2, page 317. 

20. Construct a cumulative frequency graph on squared paper for the 
40 arithmetic test scores tabulated in a frequency distribution for 
Problem r, page 317. 


Exercises in Estimating Percentiles and Percentile Ranks 


21. Estimate the 4oth and the 63rd percentiles for the до arithmetic 
test scores of Problem 1 from the cumulative frequency graph con- 
structed for Problem 20. 

22. Estimate the percentile ranks of raw scores of 37 and 56 in the 40 
arithmetic test scores of Problem 1 from the cumulative frequency 
graph constructed for Problem 20. 


5 MAJOR TYPES OF NORMS 


In the interpretation of results from standardized tests, tables of 
norms nearly always have direct value. Norms of one of the three 
types discussed in Chapter 5 are ordinarily provided with such tests: 
(т) grade norms, (2) age norms, or (3) percentile norms. As has 
been pointed out in a previous section of this chapter, derived scores 
and norms overlap in various ways. In the section of this chapter 
devoted to derived scores, the various types of meaningful scores 
that can be derived either from tables of norms for standardized tests 
or from statistical manipulations of scores from standardized and 
informal objective tests were treated. Only those derived scores enter 
into the brief discussion of this section that are related to norms in 
one of two ways: (1) as results from the use of norm tables—grade 
Scores, age scores, and percentile ranks, or (2) as scores intermediate 
between raw scores and final derived scores, e.g., standard scores, 
equated scores. 

The tremendous variety of forms in which norm tables are pre- 
sented for different standardized tests, whether of intelligence, 
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achievement, or personality, makes it impossible to represent here 
all of the variations found in such tables. Furthermore, the purpose 
here is only to familiarize the student sufficiently with the nature, 
form, and use of norms that he will be able to employ them properly 
in the interpretation of results from any standardized test he may 
have occasion to use. The discussion of norms for achievement tests 
in Chapter 5 and careful reading of instructions in test manuals 
should equip the teacher for effective use of such test norms as he 
is likely to encounter either in elementary or in high-school testing. 


TABLE 23. Mental age norms for the Pintner General Ability Tests, 
Verbal Series * 


1 Directions for Administering and Scoring: Pintner General Ability Tests, Inter- 
mediate and Advanced. World Book Co., Yonkers, N. Y., 1938. Table 1, p. 5. 
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TABLE 24. Grade and age norms for the California Language Test? 


Mech. 
Grade Eng., 
Place- and 
ment Gram. 


E E ORE 
3.6. ic. REESE 
SA crc TAR 
Beers Sd DOR. 
Si Oia 9 Idee 


Total 
Lan- 
guage 


. IO-II... 


Spell- Total 
ing Lan- 


Age 
in 


guage Months 


m oh 

CLIE 

++. 156-7 
22.158. 
..159 


184 
.155 


Du «100 
DIETE. 
‚...162—3 
2104) 
.. 165 


oes (qu 
ot elo 
...168-9 
ATO 
ТУТ 


‚жй 
015178 
...174—$ 
iem 170. 
177 


ТУВ, 
SAITO 
‚..180—1 
...182 

..183 


«4554. 
...185 
....186-7 
‚..188 
..189 


‚..190 
3; 30x 
...192-3 
2194 
+195 


..196 
# ЖӨ 
..207 
Mr hc 
..219 


2 Ernest W. Tiegs and Willis W. Clark, Manual for the California Language 
Test, Intermediate. California Test Bureau, Los Angeles, 1950. p. 20. 


366 THE SECONDARY SCHOOL 


TABLE 25. Percentile norms for the Cooperative Mechanics of Ex- 
pression Test, А? 


End-of-Year Norms in Terms of Scaled Scores 


Grade 

Scaled 9 10 
Score Percentiles * 

76 

74 

72 

70 

68 

66 

64 

62 

бо 


58 
56 


54 
52 
50 
48 
46 
44 
42 
40 
38 
36 
34 


Mean 35.2 39.6 43-4 47.2 50.5 53-3 
S.D. 8.3 8.6 8.7 8.9 9.2 9.5 


* The percentile values in the tables are those closest to the actual Scaled Scores 
listed. Interpolation may be used to obtain the closest percentiles for odd-numbered 
Scaled Scores. 


3 Secondary School Norms: Cooperative English Test. Single Booklet Edition, 
Higher and Lower Levéls. Cooperative Test Division, Educational Testing Service; 
New York. Norms for Public Secondary Schools of the East, Middle West, and West, 
Diez. 
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TABLE 26. Percentile norms for the Aspects of Personality Inventory * 
For Grades 7, 8, and 9 


PERCENTILE RANK CORRESPONDING TO GIVEN SCORE 


BOYS 


GIRLS 


Any TEST 


Sec. 1 
A-S 


Sec. II 
E-I 


Sec. III 


Sec.I 
A-S 


Sec. II 
E-I 


SET Жаы 


а Rudolr Pintner and others, Aspects of Personality: 
World Book Co., Yonkers, N. Y., 1939. Table 5, p. 7. 


35 
34 
33 
32 
31 


30 
29 
28 
27 
26 


25 
24 
23 
22 
2I 


20 


19 
18 
17 


Manual of Directions. 
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Exercises in Using Test Norms 


23. Using the mental age norms reproduced in Table 23 for the Pintner 
General Ability Tests, Verbal Series, determine the mental ages and, 
using the quotient method, the intelligence quotients of the pupils 
whose median standard scores and chronological ages are shown 


below. 
Median Chron- Median Chron- 
Pupils | Standard | ological Pupils | Standard | ological 
Scores Ages Scores Ages 
A 172 15-0 D 165 15-0 
B 105 6-8 E 182 12-9 
C 107 8-6 F 133 10-2 


24. Using the grade placement and age norms reproduced in Table 24 
for the California Language Test, determine the grade equivalents 
and age equivalents (in years and months) on the two parts and the 
total test for the eighth-grade pupils whose scores are shown below. 


Test 


Mechanics of English and Grammar | 44 50 | 42 | I5 | 55 | 49 
Spelling ВТО 3 |2r | 13 
Total Language 


25. Using the percentile norms reproduced in Table 2 5 for the Co- 
operative English Mechanics Test, determine the percentile rank 


in his grade group for each of the pupils whose end-of-year scaled 
scores are shown below. 


ee 


Scaled ; Scaled 
Scores || Pupils | Grades Scores 


Pupils | Grades 


A 9 54 D 10 51 
B 7 35 E 8 64 
© II 4I F 9 27 
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26. Using the percentile norms reproduced in Table 26 for the Aspects 
of Personality inventory, determine the percentile rank on each 
section of the test for the pupils whose scores and sexes are shown 


below. 
Section Scores 
Pupils | Sexes | Ascendancy- | Introversion- | Emotional 
Submission Extroversion Stability 
A Boy 25 19 OT 
B Girl IS 25 30 
C Boy 18 22 29 
D Girl 22 18 26 
E Girl 17 21 29 
F Boy 14 24 27 
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14 


Determining Relationships among the 
Results of Measurement 


Tus FOLLOWING points involving the relationships among test scores 
are discussed in this chapter: 

A. Need for measures of relationship. 
Correlation coefficients as measures of relationship. 
Computation of the product-moment correlation coefficient. 
Meaning of the correlation coefficient. 
Determination of test validity. 

r. Determination of test reliability. 

The discussion of Chapter 12 was concerned with the classification 
of test scores and computation of the two basic types of measures 
used in describing a single set of scores—measures of central tendency 
and measures of variability or dispersion. Chapter 13 presented 
various types of formal and informal derived scores, graphical 
representation, and norms, all designed to give meaning to a single 
set of scores. There remains to be considered the type of situation in 
which two sets of scores are obtained for the same group of indi- 
viduals and in which some measure of the relationship between the 
two sets of scores is desired. 


нори 


1 RELATIONSHIP BETWEEN SETS OF TEST SCORES 


Need for measures of relationship 
In the selection, construction, and use of educational measuring 
instruments there are many situations in which a reasonably exact 
371 
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expression of the relationship existing between two sets of measures 
is necessary. For example, the one test that most nearly measures 
the desired ability must be selected from a series of related tests. 
The method followed in such a case involves finding the relationship 
between the several tests and the ability to be measured. This pro- 
cedure, called the method of correlation, is applied when two, or even 
more than two, measures of the same individuals are employed in 
determining the degree to which certain tested traits or abilities are 
related. In practical test construction this method is used in obtain- 
ing estimates of the validity, reliability, and objectivity of a test. 


Nature of the correlation coefficient 


In the expression of relationships, as in other statistical measures, 
it is desirable to use a single mathematical value. Methods have been 
developed for describing relationships in terms of the correspondence 
between rank positions of scores and in terms of the percentage of 
scores falling within a specified unit of variability of each other, but 
ordinarily these procedures lack sufficient exactness to warrant their 
general use in the analysis of test results. The student who is inter- 
ested in these different methods will find them discussed in certain 
of the treatments on statistical methods listed in the references at 
the end of this chapter. The one method considered here, the Pearson 
Product-Moment Method, is by far the most common and is, on the 
whole, the basic method used in educational investigations. This 
method, while somewhat complicated and difficult because of the 
large number of different calculations to be made, really involves 
comparatively little that is new to the student. 

The Pearson product-moment coefficient of correlation, indicated 
by r, is a single numerical index that expresses the extent to which 
the pairs of corresponding measures of two variables tend to deviate 
similarly from their respective arithmetic means. The values of 7 
may vary all the way from +1.00, indicating perfect positive rela- 
tionship, through all of the possible decimals to zero (0.00), indicat- 
ing no relationship whatever, to —1.00, indicating a perfect negative 
relationship. The following are illustrations of a positive relationship 
between two factors: 


(1) The rise and fall of a column of mercury in a thermometer with the 
rise and fall of the outside temperature. As the temperature rises 
the column of mercury also rises. 
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(2) The direction of the wind and the movement of smoke from a chim- 
ney. The smoke moves away with the wind. 
(3) The tendency of pupils who are intelligent to be good silent readers. 


Negative correlation may be illustrated Ьу: 


(т) The movement of the elevator cage and the counterbalancing 
weights. As the elevator cage goes up, the counterbalancing weights 
move in the opposite direction. 

(2) The relation between absence from school and school achievement. 


Zero or indifferent correlation is best illustrated by means of the 
chance matching of numbered cards that have been shuffled. Two 
packs of 25 blank cards each may be numbered and the packs care- 
fully shuffled so that the cards are in no systematic order. If cards 
are drawn at random from each pack and paired, the resulting re- 
lationship is likely to be close to zero. If these same packs of cards 
are both arranged in ascending order and the first card from one 
paired with the first card from the other, the resulting relationship 
will be positive. If one pack is inverted and each time a small- 
numbered card is taken from the one pack a large-numbered card is 
taken from the other, the resulting correlation will be negative. 

This illustration of the numbered cards suggests one of the simple 
methods of expressing correlation, viz., the method of ranking. If 
pupils are given two tests and the scores from the tests tend to place 
the same pupils in the same relative positions in each series, there 
is an indication of a positive correlation between the two tests. For 
example, the accompanying pairs of scores for nine pupils indicate 
a high positive relationship between the two tests because the pupil 
making the highest score on Test A also made the highest score on 
Test B and each other pupil in the list maintained his relative posi- 
tion on both tests. This suggests that the two tests measure abilities 
which have a great many factors in common. On the other hand, if it 
had happened that Pupil 1, who made a score of 89 on Test A had 
made a score of 18 on Test B, Pupil 2, who made a score of 85 had 
made a score of 20 on Test B, and scores for the other pupils were 
similarly interchanged, the resulting correlation would have been 
negative and would have shown that the two tests were inversely 
related. That is, the negative correlation would have indicated that 
high ability on Test A accompanied low ability on Test B. 

The Pearson product-moment coefficient of correlation is computed 
from data arranged in a frequency table, but the table used is in a 
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different form from that employed in any of the problem work in 
this book thus far. The method of tabulating paired data is explained 
in the following section. 


TABLE 27. Pairs of test scores 


Basil Score on 


Number 


I 
2 
3 
4 
5 
6 
7 
8 
9 


2 COMPUTATION OF PEARSON PRODUCT-MOMENT 
CORRELATION COEFFICIENT 


Although the Pearson product-moment coefficient of correlation is 
not always the easiest measure of relationship to compute, it is the 
most reliable and the most widely used measure of this type. A de- 
tailed illustration of its computation is given on the following pages. 


Computing the Pearson product-moment r 


The speed and comprehension scores made by до pupils on a cer- 
tain reading test are used in the following illustration showing how 
the scores are tabulated in a double-entry table and the correlation 
coefficient is computed. The procedures are illustrated in Table 28. 
‚ (т) Set up a double-entry table. Construct a frequency distribution 
for one set of scores along the left side of the table in form identical 
to that used in setting up a single frequency distribution in Chapter 
12. This distribution is on the Y-axis of the correlation chart. Con- 
struct a similar frequency distribution for the second set of test 
scores horizontally across the top of the table with the low scores 
at the left and the high scores at the right. This distribution is on 
the X-axis of the chart. Extend the-chart to the right by adding 
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columns headed f,, dy, fd,, 31,2, and xy. Similarly extend the chart 
downward by adding rows headed fs, de, fde, and fd;?. It is desirable 
to use squared paper in setting up the correlation table. The result 
should have the same general form as does Table 28. 

(2) Tabulate the pairs of scores in the double-entry table. A tally 
mark in a frequency distribution shows one fact—the position in the 
distribution of the score tabulated. A tally mark in a double-entry 
table or scatter diagram shows two facts—the score made by the 
pupil on each of two tests. Therefore, tabulate the scores so that the 
tally for each pair of scores is in the table cell that simultaneously 
represents the score of the individual on each of the two tests. For 
example, the tally mark in the lower left corner of Table 28 accounts 
for a speed score of 9 and a comprehension score of 21 made by a 
certain pupil. The mark is in the row for speed scores of 8 to то and 
in the column for comprehension scores from 18 to 22. For another 
pupil, having speed and comprehension scores of 32 and 49 respec- 
tively, a tally mark appears in the cell where the 32-34 row and the 
48-52 column cross. The remaining 88 scores, not shown here, were 
similarly tabulated in the appropriate cells to complete the scatter 
diagram of the table. 

In completing the scatter diagram, tabulate each pair of scores 
separately and total the tallies in each row and in each column, re- 
cording the resulting frequencies in the f, column and the f, row. 
Separately add the frequencies in the column and row to obtain the 
total number of cases. The two totals should agree exactly, as shown 
in the NV of до for Table 28. 

(3) Assume values for the means and count off the deviations. 
Assume a value for the mean of the scores on the Y-axis and count 
off the deviations in the d, column, as was done in step 2, page 322, 
in the computation of the arithmetic mean. Similarly, assume a mean 
for the distribution of the X-axis and count off deviations in the d, 
row to the right (positive signs) and to the left (negative signs) of 
the interval in which the mean is assumed to lie. 

In Table 28, since the mean of the speed scores on the Y-axis was 
assumed to be 33.00, or the midpoint of the interval 32-34, the 
deviations were counted upward and downward from that interval. 
Likewise, since the mean of the comprehension scores on the X-axis 
was assumed to be 45.00, or the midpoint of the interval 43-47, the 
deviations were counted to the right (positive) and to the left (nega- 
tive) fronr that interval. 


TABLE 28. Computation of the Pearson product-moment coefficient of 
correlation between speed and comprehension scores on a 


certain reading test 
Comprehension 


m 
js 


25 72 


_ —89 + 68 crar 


5 go 3238. б) 


2032110210905 05:33. 


Со 03:867 (cy) 


853 
2/44 = 253 — 
4, oo 478 


бу = —.367? = 135 


Gx =А\/ 4.522 — 054. = / 4-468 2.114 


ву —4/9478 = 135 = A/ 9.343 = 3.057 
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(4) Compute the standard deviations in class-interval form. Com- 
pute the standard deviations for the Y-axis and the X-axis scores by 
the procedure outlined in Chapter r2, except that the final step of . 
multiplying by the size of the class interval is omitted. Since the 
entire process of computing the correlation coefficient is carried on 
by the use of the class-interval scales rather than the score scales, 
the standard deviations are here stated in class-interval rather than 
in score units. Except for the fact that the f, d, fd, and [d? values for 
the X distribution appear across the table at the bottom, rather than 
vertically as for the Y distribution, the steps in computing the stand- 
ard deviations present no new difficulties. The two standard devi- 
ations, o, and оу, with the subscripts indicating the X-axis and Y-axis 
data, are 2.114 and 3.057, respectively, for the data of Table 28.1 

(5) Compute the sum of the product moments, The name 
“product-moment method” implies the significant feature of the 
process. The relationship itself takes into account the operation of 
forces (frequencies) at varying distances (deviations in intervals) 
from the point of rotation (mean) on each axis. Since each measure 
assumes a position with regard to each of the two axes, the resulting 
moments must take this fact into account.” 

Table 29 is presented both to illustrate the principle of product 
moments and as an aid to the student in later computations. As the 
о row on the Y-axis and the о column on the X-axis represent the 
assumed means of the two distributions, deviations are counted in 
both directions from the o row and the o column in the same manner 
as is shown in Table 28. The moment of any cell in the table is the 
product of its deviations on the two axes when the signs of the devi- 
ations are taken into account. For example, the moment of the cell 
in the upper right corner (21) is obtained as the product of its two 
deviations—7 X 3 = 21. Similarly, the moment of the cell having a 
deviation of +1 on the X-axis and —2 on the Y-axis is —2 
(т X —2.— —2). The moment of each cell is given in its upper 
right corner. 


1 Computations for the correlation coefficient should uniformly be carried to 
four decimal places and rounded back to three decimal places, otherwise using the 
same procedure for rounding numbers as was illustrated in Chapter 12, page 321. 

2 The reader will recall the illustration of moments of forces given in Chapter 12 
in connection with the computation of the arithmetic mean. Forces are brought 
into balance by equating their moments, whether two forces are operating in one 
direction, as is the case for a single frequency distribution, or whether two forces 
are operating in each of two different directions, as is the case for a scatter diagram. 
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TABLE 29. Moments of cells in a double-entry table 
X = Axis 
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Values carried to the xy column at the right side of Table 28 are 
sums of the product moments for all scores in each row of the table. 
Since the two scores in the upper right corner have moments of 21 each, 
the product moment of 42 (2 X 7 X 3) isshown in the xy column. The 
13 scores in the interval 26-28 on the speed test will be used for an 
additional illustration. The two scores at the left have a moment of 
6 each (—2 X —3) ; thus their product moment is 12 (2 X 6). The 
product moment, shown in one operation, for the next four scores is 
16 (4 X —2 X —2). Similarly, the next five scores have a product 
moment of то (5 X —2 X — 1). The next score is in the o column 
on the X-axis, and consequently has a moment of o (1X —2 Хо). 
Therefore, the sum of the positive product moments, as obtained 
above, is 38 (12 + 16 + то). But the remaining score, to the right 
of the o column, must also be taken into account. Its product moment 
is —2 (1 X —2 X 1). The algebraic sum of the 38 and the —2 is 36; 
hence, that is the xy value shown in Table 28 for the 13 scores in the 
Y-axis interval 26-28. The other xy values were similarly computed, 
and the sum of these values, or Хху, was found to be 513. 

In computing the product moments, follow the procedure illus- 
trated above. First find the moment of each cell in which at least one 
tally mark lies. Then find the product moment of each cell. Next sum 
the product moments, taking account of signs, in each row of the 
table and carry the results to the xy column. Finally, add the ху 
values to obtain 2xy. 

(6) Obtain the mean of the product moments. Divide the xy 
of the preceding step by the number of cases to obtain Xxy/N. For 
the data of Table 28, this is 513/90 or 5.700. 

(7) Obtain the product of the corrections. As Xfd/N — c, both on 
the X-axis and the Y-axis, multiply c, by cy. For Table 28, these 
values are —.233 and —.367, and their product was found to be .086. 

(8) Obtain the product of the standard deviations. Obtain the 
product of the standard deviations on the two axes. Since the values 
given in Table 28 are 2.114 (0) and 3.057 (o,), their product (0,0,) 
was found to be 6.462. 

(9) Substitute in the formula and solve for т. Obtain the correla- 
tion coefficient by substituting the values obtained in the above steps 


in the formula: Уху 


aon t 


050y 
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in which r is the correlation coefficient, N is the number of cases, 
Ca and c, are corrections on the X-axis and the Y-axis, о, and o, are 
the standard deviations on the X-axis and the V-axis, and xy is the 
sum of the product moments for all scores in the scatter diagram. 
For the correlation chart of Table 28, the numerator of the fraction 
is 5.700 — .086, or 5.614. The denominator, given above, is 6.462. 
Therefore, 7 is 5.614 — 6.462, ог + .869. 


Summary of steps in computing the Pearson 
product-moment correlation coefficient 


Given below in summary form are the steps of procedure for com- 
puting the Pearson product-moment r by the use of a double-entry 
table or scatter diagram. 


(1) Set up a double-entry table. Construct a frequency distribution at 
the left of the chart (Y-axis) for one of the variables, using the 
procedure of steps 1 to 3, page 316, for setting up a frequency 
distribution. Construct a second frequency distribution by the same 
procedure across the top of the chart (X-axis) for the other variable, 
with the low scores at the left and the high scores at the right. 
Tabulate the pairs of scores in the double-entry table. Place a tally 
mark in the cell of the table that correctly represents the paired 
scores of each individual on the Y-axis and X-axis variables. Sum 
the tally marks in the f, column to obtain N and in the f, row to 
obtain N again as a check on accuracy. 
Assume values for the means and count off the deviations. Assume 
à value for the mean on each axis and count off the deviations in the 
d column and row, using the procedure of steps 1 and 2, page 322, 
for computing the arithmetic mean. Deviations upward and to the 
right are positive; deviations downward and to the left are negative. 
Compute the standard deviations in class-interval form. Obtain for 
both the X-axis and the Y-axis data the necessary sums of the fd 
values (2fd); sums of the fd? values (2fd?); means of the fd 
values or corrections (Xfd/N, or c); means of the fd’ values 
(X'fd?/N); and squares of the corrections (c?). Then obtain the 
standard deviation on each axis in class-interval form (o, and oy), 
using the procedure of steps 3 to 7a, page 337, for computing the 
standard deviation. As the standard deviations here are to be left 
in class-interval form, the final operation of multiplying by the size 
of the class interval (step 7b, page 337) should be omitted. 
(5) Compute the sum of the product moments. Determine the moment 
(product of X-axis and Y-axis deviations) for each cell in which at 
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least one tally mark appears, obtain the product moment for each 
such cell by multiplying its moment by its frequency, carry the 
sum of the product moments in each row to the xy column, and 
obtain the sum of the xy values (xy). 
(6) Obtain the mean of the product moments. Divide the result of step 
(5) above by the number of cases (Zxy/N). 
(7) Obtain the product of the corrections. Obtain the product of the 
X-axis and Y-axis corrections determined in step (4) above (ссу). 
(8) Obtain the product of the standard deviations. Obtain the product 
of the X-axis and Y-axis standard deviations determined in step 
(4) above (620,). 
(9) Substitute in the formula and solve for r. Substitute from above 
the Xxy/N of step (6), the cacy of step (7), and the оло, of step 
(8) in the formula 
Zxy 
—— — Caly 


N 


Обу 


r= 


and solve to obtain the correlation coefficient. 


3 MEANING OF CORRELATION COEFFICIENTS 


The method of calculating the correlation coefficient as outlined 
in the foregoing pages is quite mechanical and, as such, can be 
mastered readily by most students. The interpretation of the meaning 
or significance of a correlation coefficient is often quite another 
matter, for no entirely satisfactory mechanical device for accomplish- 
ing this has thus far been developed. A number of suggestions have 
been made recently, however, by means of which the student may be 
aided in attaching meaning to the correlation coefficient. 


Not a measure of cause and effect 


Sometimes the method of correlation is mistakenly used in the 
attempt to discover causes operating to produce certain effects. There 
is nothing in the method or the result of computing a correlation 
coefficient that indicates definitely which of the factors is a cause 
and which is an effect or whether both of the factors may be affected 
by other variables. For example, reading speed and reading com- 
prehension are positively related for pupils in several adjacent school 


382 THE SECONDARY SCHOOL 


grades. But it may not be inferred that a pupil reads with high (or 
low) comprehension because he reads rapidly (or slowly). Neither 
can high (or low) comprehension be thought of as causing rapid (or 
slow) reading. It is likely that some other variable not considered 
in the correlational relationship, e.g., mental maturity, which is 
positively related both to reading speed and reading comprehension, 
serves to explain the relationship. Thus, the cause or the explanation 
of why a positive relationship exists is not necessarily apparent in 
the data themselves but must often be sought elsewhere. 


Significance for prediction 


It will be remembered that correlation is usually indicated by 
means of what is called a double-entry table, correlation table, or 
scatter diagram. The appearance of the scatter diagram itself gives 
some indication of the amount of relationship that exists between 
the two variables shown. Assuming that the scatter diagram is made 
by tabulating upward and toward the right, which is almost a uni- 
versal practice, a high v is usually found where there is а very definite 
clustering of the cases along what would be the lower left to upper 
right diagonal of the table. This means that the cases tend to be 
grouped somewhat systematically along a line running from the lower 
left-hand corner of the table to the upper right-hand corner of the 
table. This type of grouping is shown in Table 28 on page 376. As 
the cases scatter from the line of this diagonal, the correlation is 
reduced. If the cases are scattered over the table in a generally 
circular arrangement, the resulting correlation will approximate zero. 
As the relationship changes from positive to negative, the elliptical 
grouping of the cases takes place along a diagonal running from upper 
left to lower right. After some experience with "scatter diagrams," 
the student will come to have a definite feeling about what probable 
magnitude of the correlation coefficient to expect. 

One of the important outcomes of the use of correlation methods is 
that within certain limits it makes possible the estimating of un- 
known values from known values. The accuracy of this estimate, 
however, depends directly on the correlation between the factors 
measured. For example, if it is known from previous experience that 
there is a high positive relationship between achievement in a 
specific subject and the pupils’ scores on a certain aptitude test, the 
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probable achievement of a group of pupils in this course may be 
determined within limits by securing their scores on the aptitude 
test. A correlation coefficient of + 1.00 for the two factors would 
mean that an estimate of accomplishment based on the one factor 
would be roo per cent correct. As the amount of the correlation de- 
creases, the accuracy of the forecast declines, but not in a direct 
manner. A correlation of -+ 1.00 means roo per cent accuracy in the 
estimate based on the relationship, but a correlation of + .50 does 
not mean that the estimate based on it will be so per cent correct. 
A glance at the accompanying table will demonstrate this interesting 
fact about the correlation coefficient. The percentages of forecasting 
accuracy for different values of 7 given in Table зо are obtained by 
applying the formula for the coeficient of alienation (k = ут —r*) 
and then deducting the resulting values, expressed as percentages, 
from тоо. In cases where estimates of one variable are to be made 
from measurements of another related variable, this table will prove 
to be a useful safeguard. 


TABLE 30. Percentages of forecasting accuracy for certain values of r 


Coefficient | Percent of 
of Forecasting 
Correlation] Efficiency 
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А word of warning should possibly be given here in order that the 
student may not become overoptimistic in the interpretation of cor- 
relations. It is better to be on the safe side when making claims for 
the reliability or the forecasting power of a test. Much damage has 
been done to the cause of educational measurements by the un- 
qualified and exaggerated statements of misinformed individuals. As 
a result of frequent misinterpretation of the measures of relationship, 
many tests of questionable validity and reliability have been ac- 
cepted and used widely. 


үе Educational Situation Interpretation 

-F.96, Relation between scores on | Evidence of unusually high 
two forms of a long, ana- | reliability; scores may be 
lytical reading test for high- | treated with confidence. 
school pupils. 

+ .90| Relation between scores on | Evidence of marked reliabil- 
two forms of a 45-minute | ity. 
group intelligence test. 

+ .8о | Relation between scores оп | Evidence of marked rela- 
the same form of a group | tionship; considerable prog- 
test of intelligence at the be- | nostic power even after lapse 
ginning and end of a se- | of time. 
mester. 

+ .50| Relation between scores on | Evidence of a medium rela- 
a good group intelligence test | tionship of little value for 
and course marks of a class | forecasting purposes (only 
in first-year algebra. 13% effective). 

— .24| Relation between chronolog- | Evidence of an indifferent 
ical ages of pupils in a given | negative relationship; shows 
grade and scores on an ob- | a slight tendency for the 
jective achievement test. younger pupils in a grade to 

achieve at a higher level than 
the average. 


The preceding illustrations and practical interpretations of typical 
correlation coefficients representative of the sort obtained from edu- 
cational data have been gleaned from a number of sources. They are 
offered here for whatever guidance they may give to the student or 
the teacher in making a critical and conservative interpretation of 
correlation data. 
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4 PRACTICAL USES OF CORRELATION COEFFICIENTS 


The classroom teacher and the student of measurement will find 
the greatest opportunity to use correlation techniques in connection 
with the construction and analysis of objective tests and in the criti- 
cal selection of standardized tests. The uses briefly mentioned and 
in three cases illustrated below all relate to the determination of test 
validity, reliability, or objectivity. 


Determination of test validity 


Test validity can be determined in terms of correlations between 
scores on the test and: (т) teachers’ marks, (2) ratings of expert 
judges, (3) other known measures, and (4) measures of future out- 
comes. All of these situations involve only the ordinary application 
of the correlation method; therefore, as their values are discussed 
in Chapter 4, they are not discussed further here. 


Evaluation of test reliability 


The correlation coefficient enters directly into the procedures most 
common for determining or estimating the reliability or consistency 
of a test, or the degree to which it measures whatever it does meas- 
ure. As is pointed out more in detail in Chapter 4, there are three 
correlation methods and two non-correlation methods that can 
effectively be used by the teacher in estimating the reliability of his 
classroom tests. These are: (1) the reliability coefficient, (2) the 
retesting coefficient, (3) the “chance-half” coefficient, (4) the “foot- 
rule" coefficient, and (5) the standard error of measurement. 

Reliability coefficient. 'This coefficient requires only brief mention 
here, because it involves the usual type of correlational relationship 
between two series of scores. The reliability coefficient itself is ob- 
tained only by correlating scores made by the same pupils on two 
equivalent forms of the same test. 

Retesting coefficient. The retesting coefficient, requiring correla- 
tion of scores obtained from a first and a second administration of 
the same test to a group of pupils, furnishes only an estimate of test 
reliability. The retesting coefficient is one of the methods used when 
the availability of only one form of the test eliminates the possibility 
of obtaining a reliability coefficient directly. 
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"Chance-half" coefficient. This is a second method of estimating 
the reliability coefficient from the results of the administration of a 
single test to a pupil group. For this method the first step of pro- 
cedure is to obtain two “half-scores” for each pupil on arbitrary 
halves of the test. The arbitrary halves of the test frequently consist 
of the odd-numbered and the even-numbered items. The second step 
is to obtain the coefficient of correlation between the sets of half- 
scores for the group of pupils. This coefficient represents the re- 
liability of one-half of the test, but not of the entire test. 

The third and final step requires the use of the Spearman-Brown 
Prophecy Formula in estimating the reliability for the entire test by 
what is known as “stepping up” the correlation. A test increases in 
reliability as it is increased in length by additional test items com- 
parable to those in the initial test ; thus, the estimated reliability for 
the entire test is greater than for only half of the test. However, the 
increase in the coefficient is not directly proportional to the increase 
in test length. The Spearman-Brown formula is 


where 7}, is the correlation between scores on the “chance-halves” 
of the test and ry. is the estimated reliability coefficient for the 
entire test. 

If an estimate of the reliability of an entire test is desired when 
the correlation coefficient between its “chance-halves” is .85, the 
following result is obtained. 


This is the procedure a teacher may use to obtain an estimate of the 
reliability of his test from a single administration. 


3 The general form of the formula, which is not of direct concern here, is: 

nro 
ee 
Cis tr) tio 
in which Уза represents the coefficient of reliability of a test and rn represents the 
coefficient of reliability of a test of homogeneous test materials times as long. It 
should be noted that substitution of 2 for » in this formula, to determine the effect 


of doubling the length of the test, results.in the special formula given above except ; 
for differences in the subscript for r. 


fh 


DETERMINING RELATIONSHIPS IN MEASUREMENT 387 


“Footrule” coefficient. This coefficient is a third and quite simple 
method of obtaining an estimate of the reliability of a test available 
in only one form.* The only values required are the arithmetic mean 
and standard deviation of the test scores and the number of items in 
the test. The formula is 


where pant and q = 1.00 — р, and where M is the arithmetic 


mean of the test scores, o, is the standard deviation of the test scores, 
and z is the number of test items. 

The “Footrule” coefficient of a test of 249 items for which the 
arithmetic mean and standard deviation of scores were respectively 
168.65 and 25.34 would be obtained by the following procedures 


EAD ерж q = 1.00 — р = .32 
cur ААН Mine BOD = = E 
Ma des 
ee n x ot mpd 
n-l ot? 
249 ., 25.347 — 249 X .677 X -323 
= X > 
248 25.34 
642.116 — 54.45 __ MP 
— 1.004 643.116 — 1.004 X .915 — .919 


Standard error of measurement. 'This measure of the confidence 
that may be placed in an obtained score is based on the reliability 
coefficient of the test, but it supplies a more concrete and invariable 
measure of consistency than does the reliability coefficient itself. 
This is a result of the fact that the standard error of measurement 
is not, as is the reliability coefficient, influenced by varying ranges 
of talent in the pupil group upon which the measure is based. 

Computation of the standard error of measurement involves the 
use of the formula S.E.,, = S.D. V 1 — ғ, where S.E.n is the standard 
error of measurement, S.D. is the standard deviation of the distribu- 
tion of scores, and r is the reliability coefficient of the test. The 
standard deviation of the distribution of 37 reading test scores used 


4 б. Е. Kuder and M. W. Richardson, “The Theory of the Estimation of Test 
Reliability." Psychometrika, 2:151-60; September 1937 (Formula 21). 
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to illustrate computational procedures in the preceding chapters was 
shown in Table 17, page 333, to be 10.86. The reliability coefficient 
of the test, based on the correlation coefficient between the scores 
reported in Table 9 and scores made by the same 37 pupils on a 
comparable form of the test, was found to be .956. Using this re- 
liability coefficient апа the value of the S.D. shown above in the 
formula, the standard error of measurement becomes 


SE 19186 vA — .956 


10.86 X /.044 
10.86 X, .210 
2.28 


ПШ 


Although a complete explanation of how to apply the standard 
error of measurement is not feasible here, its general use can be il- 
lustrated. It will be remembered that a pupil's obtained score is only 
an estimate of what his true score would be if the test were com- 
pletely reliable. As no test is absolutely reliable, an obtained score 
must be interpreted as an estimate of the true score. The standard 
error of measurement indicates how far the obtained score may be 
expected to deviate from the true score. An illustration will be given 
in terms of a score of 48 made by a certain pupil on this test. His 
obtained score is almost certain to fall within three standard errors 
of his true score. Therefore 3 X 2.28, or 6.84, is subtracted from 
and added to 48 (48.00 — 6.84 = 41.16 and 48.00 + 6.84 = 54.84). 
Thus it is a practical certainty that the pupil’s true score on this test 
lies between 41.16 and 54.84. Again, the chances are about two in 
three that his obtained score lies within one standard error of his 
true score. Thus 48.00 — 2.28 and 48.00 + 2.28, or 45.72 and 50.28, 
indicate the limits of his obtained score from his true score that are 
not likely to be exceeded more than once in three times. 

Although there are no definite standards for applying this type of 
check on the reliability of a test score, it may be said that in general 
a test is more reliable when the standard error is small than when it 
is large. It is apparent from the above formula that the standard 
error is small when test reliability is high and large when test re- 
liability is low. This discussion of the reliability of test scores should 
illustrate the fact that too great dependence on test scores is un- 
warranted, for a test score is at best only an estimate, and an 
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estimate subject to some degree of error, of what the corresponding 
true score would be if it were obtainable. 


Determination of test objectivity 


When a group of test papers has independently been scored twice, 
either by the same person or by different persons, the correlation 
coefficient between the two sets of scores is the objectivity coefficient. 
For a highly objective test, the coefficient should closely approach 


-+ 1.00. 


Vocabulary Tils Vocabulary 
Total Scores Scores otal Scores E 


Read. | Vocab. Even Vocab. 


65 | ror 49 82 
106 52 74 
83 40 67 
97 75 
54 117 
49 85 
86 92 
103 86 
II3 à 79 
96 91 
94 69 
88 71 
89 81 
106 63 
46 i 86 
49 j 92 
70 74 
114 78 
IOI 97 
79 74 


a 
b 
c 
d 
e 
f 
g 
h 
i 
J 
k 
1 
m 
n 
o 
P 
q 
т 
5 
t 


Problems on Computing the Correlation Coefficient and 

Estimating the Reliability Coefficient 

27. Compute the Pearson product-moment correlation coefficient be- 
tween the reading and vocabulary test scores listed above for 40 
ninth-grade pupils. (7 = -F.731) 
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28. Obtain an estimate of the reliability coefficient of the vocabulary 
test by computing the Pearson product-moment correlation co- 
efficient between the “odd” and “even” scores listed above for 40 
ninth-grade pupils and then using the Spearman-Brown Prophecy 
Formula. (712 = .954) 


29. Compute the “Footrule” coefficient for a history test consisting of 
120 items on which a class of ninth-grade pupils attained an arith- 
metic mean and a standard deviation of 71.20 and 12.60 respec- 
tively. (ri, = .819) 

30. Compute the standard error of measurement for the vocabulary test 
of Problem 27, using the 712 of .954 and the standard deviation, in 
score form, of 12.40. (S.E.m = 2.73) 
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Measuring and Evaluating in the 
Receptive Language Arts 


THE FOLLOWING important points involved in the measurement and 
evaluation of listening and reading skills are summarized in this 
chapter : 


A. The educational and social significance of listening and 
reading. 

Major objectives of listening and reading. 

Testing listening comprehension. 

Testing work-type reading. 

Testing in literature. 

Types of remedial material in reading. 


BOB ы фы 


The receptive language arts, which are here considered to consist 
of listening, reading, and the study skills, are distinguished from the 
expressive language arts, which include oral and written language, 
usage, grammar, spelling, and handwriting. Both the receptive and 
expressive aspects of language are included in the basic skills stressed 
in the elementary school and are equally important factors in accom- 
plishment in the Secondary school. The receptive or assimilative 
language arts are dealt with in this chapter, while the following 
chapter is concerned with the expressive forms of language. The 
treatments in both chapters are confined to the English language, 
Whereas the foreign languages are the concern of Chapter 17. 

392 
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] LISTENING AND READING AS RECEPTIVE LANGUAGE SKILLS 


Educational and social significance of listening and reading 


There is a growing conviction on the part of students of the lan- 
guage arts areas that, of the two language channels through which 
information is received, listening has been seriously neglected in favor 
of reading. Reading has been analyzed, investigated, and evaluated 
perhaps more than any other school subject. The educational liter- 
ature of the past decade is heavily loaded with articles, books, de- 
vices, and tests dealing with reading. Listening, on the other hand, 
appears relatively infrequently in the literature. As Brown ? pointed 
out, “if the average individual depended less upon listening than upon 
reading, there might be reason for this neglect." Studies and ob- 
servations dating back over a twenty-year period indicate that 
listening is without question one of the most frequently used language 
activities. Actually, the average adult spends approximately three 
times as much time in listening as he does in reading." Tf this is the 
case, there is ample reason for the growing belief that listening skills 
should be developed as a part of a systematic program of instruction 
in the language arts. 

In a survey designed to discover (1) what percentage of the school 
day children are expected to listen, (2) whether teachers themselves 
are aware of the amount of listening children are expected to do, and 
(3) what relative importance teachers place upon the four phases of 
language education, Wilt * concluded, among other things, that 
pupils are expected to spend more time in listening than in any other 
single activity of the school. Teachers apparently are unaware of the 
above fact. Moreover, they apparently are more concerned about 
the individual who is reading aloud or speaking than they are about 
the listeners. Almost fifteen hundred teachers estimated that children 
learn through reading approximately 110 minutes of the average 
school day, and through listening 78 minutes. Actually, the median 
amount of listening time was 158 minutes, 54 per cent of which was 


1 James I. Brown, “The Construction of a Diagnostic Test of Listening Com- 
prehension.” Journal of Experimental Education, 18:139-46; December 1949. 

2 Paul T. Rankin, “Listening Ability; Its Importance, Measurement and Devel- 
opment.” Chicago Schools Journal, 12:177-795 January 1930. — у 

3 Miriam Е. Wilt, “А Study of Teacher Awareness of Listening as a Factor in 
Elementary Education." Journal of Educational Research, 43:626-36; April 1950. 
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spent in listening to the teacher, 31 per cent in listening to the other 
children, and 15 per cent in miscellaneous listening. While no com- 
parable data on this problem are available at the high-school or 
college levels it is quite probable that the relative importance of 
listening as a factor in learning at these levels is even more signifi- 
cant than at the elementary-school level. The logical conclusions 
from this train of thought should result in making all teachers sensi- 
tive to the importance of listening as a factor in intelligent com- 
munication and in practically all types of learning. This in turn 
Should result in granting to listening a place of importance in the 
curriculum at least equal to that of reading. 

The relatively large proportion of classroom time given over to 
listening may seem somewhat less serious when it is remembered that 
only a part of learning occurs in the classroom. The solution of most 
classroom problems in the modern secondary school requires the 
skillful use of books as sources of information. When considered 
from this point of view, reading as a school responsibility is some- 
thing more than merely the rapid comprehension of printed symbols 
and the memory and organization of the materials read. It is 
also the ability to utilize books and libraries as efficient sources of 
information. This tendency to treat reading as a highly important 
tool of learning has resulted in establishing a very close relationship 
between reading and practically every other school activity. As a 
means of gaining information and pleasure it is essential in every 
content subject, such as history, geography, science, literature, and 
arithmetic. 

A full appreciation of the importance of intelligent reading in 
society at large has also developed in recent years. Reading is con- 
sidered the indispensable means by which adults may keep abreast 
of current happenings and familiarize themselves with current social, 
community, political, and national problems. The mass of printed 
matter which the typical adult must read and evaluate, even within 
the limits of his own fields of interest, is stupendous. This situation 
makes all the more imperative the development of a high degree of 
reading skill in our schools. 


Reading and literature 


The use of reading for the acquisition of facts and figures, for the 
satisfaction of academic requirements, represents only one small 
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phase of its function. The ability to read opens up rich avenues of 
enjoyment and pleasure in the fields of literature which provide the 
mediums through which the individual can become acquainted with 
life, its meaning and significance. Reading reveals those aspects of 
life, its activities, ideas, ideals, and emotions, about which human 
interests cluster. But these are not freed for the individual unless he 
has acquired skill in reading and a desire to sample the right sort 
of experiences in his reading. Thus the teacher of high-school and 
college English faces the dual responsibility of developing new or re- 
viving old reading skills, and directing the reading interests of the 
individual into wholesome experiences. 


Major objectives and outcomes in reading and listening 


The necessity for the development of a high level of reading ability 
along with effective skill in critical and evaluative listening on the 
part of all children and. adults іѕ тоге readily realized when it is 
recognized that a large share of the vast bulk of facts, informations, 
and skills they are supposed to master are obtained through these 
avenues. The real significance of the matter is seen in the fact that 
the level of reading and listening ability on the part of many of these 
individuals is not particularly high. 

The extremely wide variety of school and life situations in which 
children and adults read or listen is indicated in the following list 
of reading and listening objectives, attitudes, and abilities. The out- 
line itself is an adaptation of material presented by Greene and 
Gray * in a discussion of the measurement of understanding in the 


language arts. 


A FUNCTIONAL ANALYSIS OF READING AND LISTENING 


I. Objectives in Reading and Listening 
A. Typical life situations which lead children and adults to read 
or listen 
т. To find out what is going on 
2. To find one's way about 


4 Harry A. Greene and William S. Gray, “The Measurement of Understanding in 
the Language Arts.” The Measurement of Understanding, Forty-Fifth Yearbook of 
the National Society for the Study of Education, Part I. University of Chicago 
Press, Chicago, 1946. p. 189-200. Quoted by permission of the Society. 
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c 


IO. 
II. 


To understand directions and assignments 

To verify spellings, pronunciations, meanings, use of 
words 

To secure answers to specific questions 

To gather information for fuller understanding. or for 
informing or convincing others 


7. To learn how to act in new situations 
8. 
9. 


То work out complicated problems 

To reach conclusions as to guiding principles, relative 
values, or cause-effect relationships 

То identify and resolve propaganda 

То search for and discover the truth 


Typical recreational situations which lead children and adults 
to read or listen 


I. 


сут 


То relive everyday experiences 


2. To have fun or sheer enjoyment 
2: 
4. To satisfy curiosities about strange times and places, 


To escape from real life 


human nature, and motives 

To enjoy sensory imagery 

То enjoy ready-made emotional reactions through hear- 
ing or reading romantic tales, sentimental verses, mystery 
stories 

To enjoy the sentiments and ideals expressed by others 
To enjoy the rhythm and quality of expression in both 
prose and poetry 


II. Basic Reading and Listening Knowledges, Attitudes, and Skills 
Responding to the motive, problem, or purpose of a statement 
Directing attention to the meaning of what is read or heard 
Developing fluent, accurate perception of word forms 
Recognizing and using new words and meanings 

Securing an adequate understanding of what is read or heard 


BoOp> 


I. 


To grasp meanings of words appropriate to the context 


2. To fuse word meanings into a chain of related ideas 
3: 
4. 


To recognize the relationship and importance of ideas 
То handle unusual word order, complex sentence struc- 
ture, abstract ideas 

To interpret meaning in the light of the total setting, the 
author's or the speaker's tone and intention 

To supplement the specific meanings by reading between 
the lines, drawing inferences, seeing implications 
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F. Reacting critically to what is heard or read 


I. 
2. 


3. 


To 
To 
To 


realize the significance of the ideas presented 
judge the validity of the ideas presented 
evaluate the soundness, accuracy, ог completeness of 


the author’s or speaker’s conclusions, and the accuracy 
of his reasoning 


G. Blending the ideas acquired with previous experience 


I. 


чеш э 


To 
To 
To 
To 
To 
To 


acquire new insights 

reaffirm or modify previous understandings 
solve critical problems 

acquire rational attitudes 

modify behavior 

broaden interest 


lH. Attitudes, Skills, and Procedures Essential in Work-Study Type 


Reading 


A. Comprehending quickly what is read 
т. To use rapid and rhythmic eye-movements 


2. 


To 


avoid lip reading 


3. To correctly associate symbols, words, and meanings 


B. Locating needed information 


I. 


DUES 


To 
To 
To 
To 
To 
To 


understand and use an index 

use a table of contents 

use the dictionary 

use library card files 

use reference books 

use and interpret maps, graphs, and tables 


C. Gathering and evaluating information in the light of a given 
purpose 


I. 


To 


or de но ро бр 


recognize the purposes to be achieved, by 
Finding answers to specific questions 

Finding the central thought of the selection 
Following a sequence of related events 

Enjoying the facts or the story presented 
Identifying important points and supporting details 
Selecting facts relating to the problem 

Solving a specific problem 

Understanding and following directions 
Comparing the views of authorities 

Supporting a point of view or a course of action 
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2. To apply appropriate fact-finding techniques such as 

a. Studying the title for cue to its meaning 

b. Reading carefully to discover what the author plans 
to do or say 

c. Noting especially topic sentences or paragraphs 

d. Noting the author's method of arriving at his point 

e. Grasping the author's organization of ideas 

To separate essential from non-essential information 

To judge the significance of relevant information 

5. To organize information in terms of the specific problem 
a. Summarizing 
b. Outlining 

6. To draw tentative conclusions defensible in the light of 
the facts 

7. To decide when the purpose has been achieved 

8. To give credit to sources of facts and information 


B 


D. Adjusting reading attitudes and procedures to different purposes 

I. To select and remember relevant facts in reading to 
answer factual questions 

2. To note the author's organization of facts, select, as- 
sociate, remember, and reorganize them in preparing à 
report 

3. To select relevant facts, compare them with other known 
facts, and judge their validity in determining the accuracy 
of facts or events described 
a. Memorizing quickly 
b. Reporting without notes 

4. To read slowly and carefully when a thorough under- 
standing of relatively difficult material is involved 

5. To read rapidly when the purpose is to find out what is 
in the article or to enjoy a story 

6. To skim rapidly when hunting for relevant material or 
locating specific items of information 


IV. Attitudes, Skills, and Procedures Essential in Interpretative Oral 
Reading 


A. Insuring a thorough grasp of the author's meaning by utilizing 
those skills specified for the purpose in the foregoing outline 

B. Developing a clear, pleasant, properly modulated voice, clear 
enunciation of words, and correct pronunciation of words 

C. Having a compelling motive for reading to others 

D. Sensing the importance of the message for the listening audience 
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E. Adjusting manner and speaking voice to the size of the room, 
character of the selection, and needs of the audience 

F. Modulating voice to bring out thought relationships clearly 

G. Adjusting the voice to changes in character and mood 

H. Adjusting rates of reading and the grouping of words to the 
rhythm of poetry 

I Using appropriate facial expression and gesture, subordinated 
to the thought of the selection 

J. Controlling breathing and bodily movements 

K. Feeling confident of ability, free from tension, natural, sincere, 
and convincing in manner and speech 


The foregoing outline of the major objectives of listening and 
reading affords a useful basis for the evaluation of present instruc- 
tional emphasis in these two important acquisitive skill areas as well 
as a valuable source of criteria for the validation of analytical and 
corrective procedures in listening and reading. 


2 IDENTIFICATION OF FACTORS AFFECTING LISTENING 
AND READING 


Factors in listening efficiency 


It seems safe to assume that listening, like reading, is a composite 
of several somewhat independent but related skills. In addition to 
intelligence and reading comprehension, Nichols? indicated that 
such factors as the following appear to influence the individual's 
listening comprehension : 


Recognition of correct English usage 

Size of the listener's vocabulary 

Ability to make inferences 

Ability to see the organization plan of a speech 

Ability to listen for main ideas rather than for specific facts 
Use of special techniques for the improvement of concentration 
Real interest in the subject discussed 

Physical fatigue 

Audibility of the speaker 

Respect for listening as a means of learning 

Susceptibility to distractions 

Experience in listening 


NS 
БП бо о тиф DT 


5 Ralph G. Nichols, “Factors in Listening Comprehension.” Speech Monographs, 
15:154-63; Research ‘Annual, No. 2, 1948. 
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In an attempt to identify the basic measurable factors in listening 
comprehension, Brown ° concluded that the following skills are in- 
volved in the effective use of listening as a learning instrument: 


Identification and recall of details presented orally 

Ability to follow the sequence of details in the form of oral directions 

Retention of details long enough to answer questions about them 

Ability to listen reflectively for the purpose of identifying the central 

idea of the statement given orally 

5. Ability to draw inferences from the supporting facts presented in the 
statement 

6. Ability to distinguish relevant from irrelevant materials 

Use of contextual clues to word meanings 

Recognition of transitional elements in sentences 


q O0 юн 


oo 


On the basis of these factors Brown proposed an analytical test 
of listening comprehension. 


Typical defects in reading 


The solution of the problem of the effective initial teaching of 
reading as well as the development of satisfactory remedial materials 
in reading is dependent to a large degree on the accurate identification 
of the specific causes of reading failure. Not only is it necessary to 
discover the child who in his later school experience is almost certain 
to encounter reading difficulties, but these reading difficulties must 
be identified definitely and accurately. Harris listed and discussed at 
length * the following causes of reading difficulties: (1) low intelli- 
gence, (2) visual defects, (3) auditory defects, (4) other physical 
conditions—defects of muscular coordination and speech, glandular 
disturbances, and neurological difficulties, (5) lack of hemispherical 
dominance, (6) poor school record, (7) deficiencies in arithmetic, 
spelling, and handwriting, and (8) emotional and social problems. 

‚ He pointed out, however, that it is “impossible to determine the rela- 
tive contribution of each handicap to the total picture of failure. . . . 
From a practical standpoint, the aim of a thorough diagnosis is not 
to fix the blame for the child's difficulties, but to discover each 6! 
the many conditions that may require correction." 

6 Brown, of. cit. p. 140-41. 

7 Albert J. Harris, How to Increase Reading Ability: A Guide to Individualized 
and Remedial Methods, Second edition. Longmans, Green and Co., New York, 


1947. Chapter 7. 
8 Ibid. p. 242. 
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Oral vs. silent reading 


An examination of the major aspects of remedial work in reading 
indicates that there are two angles from which it may be considered. 
In the first place, remedial instruction may be begun in the oral 
reading field. Gray and others have defended this point of attack 
on the problem on the ground that it enables the teacher to start 
with the child on a level at which he already has some mastery, that 
is, the oral language level. Others believe that on account of the large 
proportion of reading time spent in the work-type of silent reading 
this field should receive the special emphasis. There is merit on both 
sides of the question, undoubtedly. It is true that the child does come 
to school with a fairly adequate oral vocabulary, which in a great 
many ways affords the natural approach to reading. On the other 
hand, it is also true that such an approach tends to place too large 
an emphasis on the pronunciation of words and too little on their 
meaning when encountered in silent reading situations. The transfer 
from the emphasis on oral reading (pronunciation of words) to silent 
reading (comprehension of meaning of words, sentences, and para- 
graphs) must be made at some point in the child’s experience. Ac- 
cordingly, a great many teachers hold that the place to start the 
emphasis on silent reading is at the beginning. Some foundation for 
this belief is seen in the results obtained by many teachers who place 
the emphasis on the development of silent reading skills at the outset. 


Readability 


During the past decade teachers have shown a surprising increase 
of interest in the objective, valid, and reliable evaluation of the 
suitability of textbooks and other reading material for classroom use. 
One aspect of this type of textbook evaluation involves readability, 
or the understandability of printed material. Betts ° pointed out that 


current interest appears to be centered on the language and the content 
of reading material. Recent workers have concerned themselves with 
relationships between these factors in readability: vocabulary difficulty, 
vocabulary diversity, sentence length or structure, “human interest,” 
and meaning. On the basis of these factors, formulae have been derived 
for predicting the difficulty of reading material. Objective measures of 
readability are given precedence over author and teacher judgments. 


э Emmett A. Betts, “Readability: Its Application to the Elementary School.” 
Journal of Educational Research, 42:438-59; February 1949. 
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The early work of Vogel and Washburne in preparing the Win- 
netka Graded Book List provided the basis for the concept of reada- 
bility as well as for the general method of measuring it. In their first 
formula they weighted four factors: (т) number of different words, 
(2) number of uncommon words, (3) number of prepositions, and . 
(4) the number of simple sentences.!^ Ten years later they reported a 
revised readability formula based upon only three elements: number 
of different words, number of uncommon words, and the number of 
simple sentences. According to Lorge “the pattern established by 
this formula has been followed by Lewerenz (1929), Ojemann ( 1933); 
Dale and Tyler (1934), Gray and Leary (1935), Lorge (1939), 
Flesch (1943), and by Dale and Chall (1948). In each instance a 
multiple regression formula was developed relating a criterion and 
some internal indications of expressional difficulty.” 12 Four kinds 
of elements appear to have been considered in most of these reada- 
bility formulae: (т) vocabulary load, (2) sentence structure, (3) idea 
density, and (4) human interest. Yoakam, in an early unpublished 
study, indicated that vocabulary load was a sufficiently reliable index 
of readability. Lorge?? concluded that a weighted index of vo- 
cabulary load is one of the best measures of difficulty in texts 
planned for use below the fourth or fifth grades. 

Readability formulae if used within proper limits make available 
to the classroom teacher very valuable means of evaluating written 
materials presented to the child. Tt should be understood that thus 
far they reflect only certain mechanical aspects of understandability. 
They do not necessarily reveal anything concerning the quality, com- 
plexity, or relationships of the ideas expressed. 


Measuring listening comprehension 


Tests for the measurement of the comprehension of orally pre- 
sented material at all school levels are limited in number and in 
scope. In fact, no standardized listening comprehension test is avail- 


19 Mabel Vogel and Carleton Washburne, “Ап Objective Method of Determining 
Grade Placement of Children's Reading Material,” Elementary School Journal, 
28:373-81; January 1928. 

11 Carleton Washburne and Mabel Vogel, “Grade Placement of Children’s 
Books.” Elementary School Journal, 38:355-64; January 1938. 

12 Irving Lorge, “Readability Formulae—An Evaluation.” Elementary English, 
26:86-95; February 1949. 

18 Ibid. p. 92. 
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able at the elementary-school level and only one at the secondary 
level is known to the authors. Some very interesting experimental 
work on the validation of a listening comprehension test for the in- 
termediate grades has been reported. In the light of the great im- 
portance of listening as a learning tool, it is not clear why so little 
has been done with it at the upper intermediate, secondary, and 
college levels. 

A lack of auditory discrimination was recognized by Murphy and 
Durrell 1+ as опе of the three most important causes of pupil failure 
in learning to read. The first section of the Murphy-Durrell Diag- 
nostic Reading Readiness Test consists of 84 items designed to deter- 
mine the ability of the pupil to recognize similarities and differences 
in the sounds of words by comparing the name of a picture and the 
sound of a word. The first group of items tests the pupil’s ability to 
sense the differences in the beginning sounds of words; the second 
group tests the ability to sense final sounds. 

The Brown-Carlsen Listening Comprehension Test,” is a carefully 
validated test standardized for use with high-school students and 
college freshmen. It appears to be a very useful supplementary in- 
strument in connection with analytical and remedial work in many 
achievement areas. The test calls for pupil reactions to 76 test ques- 
tions divided into five parts as follows: (1) Immediate Recall, (2) 
Following Directions, (3) Recognition of Transitions, (4) Recogniz- 
ing Word Meanings, and (5) Lecture Comprehension. Naturally 
the administration of the test is entirely oral, but it is not difficult to 
give to regular class groups within a class period. All responses are 
recorded on machine-scorable answer sheets. Percentile norms afford 
an easy means of interpreting the test results by grades. 


Performance tests in silent reading 


The Betts Ready to Read Tests include not only reading readiness 
tests but also tests for the diagnosis of difficulties for pupils who do 
not read normally. The tests for oculomotor and perception habits 
require the use of a series of slides and the Betts-Keystone Tele- 
binocular, a type of stereoscope that provides a scaled holder ad- 


14 Helen A. Murphy and Donald D. Durrell, Murphy-Durrell Diagnostic Read- 
ing Readiness Test. World Book Co., Yonkers, N. Y., 1947. 

15 James I. Brown and С. R. Carlsen, Brown-Carlsen Listening Comprehension 
Test. World Book Co., Yonkers, N: Y., 1952. 

16 Distributed by Keystone View Co., Meadville, Penn. 
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justable for various distances. The tests measure fusion, visual acuity, 
muscular balance, eye coordination, depth perception, and astigma- 
tism. Their purpose is not to diagnose visual defects as a basis for 
prescription but to locate pupils who should be referred to eye 
specialists for examination and remediation. 

The Ophthalmograph ?* is a binocular eye-movement camera used 
to obtain a simple and objective record of eye movements during the 
reading process. Information is provided on a film strip concerning 
the number of eye fixations, recognition span, regressive eye move- 
ments, rhythm, reading speed, and coordination of the eyes. Charts 
are provided for use in easy determination of total reading time for a 
given number of words. This procedure measures the eye mechanics 
of reading, and should ordinarily be supplemented by a test of read- 
ing comprehension. 

The Metronoscope?* is a device for exposing printed strips of 
reading matter at desired rates of speed and can be used either with 
individuals or small pupil groups for testing and drill purposes. 

The Durrell Analysis of Reading Difficulty materials include a 
hand-operated tachistoscope, or device for exposing reading strips at 
desired rates, for use in determining word recognition and phrase 
comprehension. А test of oral reading measures phrase reading, voice, 
enunciation, expression, and general word skills, and is accompanied 
by questions to test comprehension. 


3 ANALYSIS AND DIAGNOSIS IN SILENT READING 


Measurement of work-study type of reading 


The emphasis given to the work types of reading in the list of 
skills given on pages 395 to 399 indicates something of the importance 
of this type of reading in relation to the total reading field. Some 
pupils fail (in mathematics or science, for example), not entirely 
because of ignorance of the basic facts, or lack of mental ability to 
understand the explanations, but rather on account of sheer inability 
to read. In fact, one of the best ways to improve work in many other 
school subjects is to make a drive on the work type of reading ability. 


17 Distributed by American Optical Co., Southbridge, Mass. 
18 Ibid, 
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A recognition of this has caused makers of tests in reading to turn 
their attention in this direction in recent years. A number of excel- 
lent reading tests which provide useful analytical information con- 
cerning a number of work-study skills are available for use in the 
secondary school. 

The Iowa Silent Reading Tests, New Edition, Advanced, are 
among the more recent and comprehensive tests designed to provide 
a detailed and analytical measure of silent reading abilities. These 
new quick-scoring tests go beyond the general survey of two or three 
phases of silent reading ability. They cover a wide range of skills 
essential to effective reading of the work-study type. Naturally they 
do not succeed in measuring all of the major objectives of reading 
as outlined in the opening pages of this chapter. The following 
summary shows the unit skills contributing to the pupils’ ability to 
read and to work with books which are measured by these reading 


tests: 29 


Test 1. Rate and Comprehension 
Science material 
Social studies material 


Test 2. Directed Reading 
Science material 


Test 3. Poetry Comprehension 


Test 4. Word Meaning 
Social sciences 
Sciences 
Mathematics 
English 
Test 5. Sentence Meaning 
Test 6. Paragraph Comprehension 


Test 7. Location of Information 
Use of the index 
Selection of key words 


For a number of years а self-initiated and incorporated group 
known as the Committee on Diagnostic Reading Tests has been 
and V. H. Kelley, Manual of Directions: Iowa 


19 H. A. Greene, A. N. Jorgensen, 
Ne Te ] Advanced. World Book Co., Yonkers, N. Y. 


Silent Reading Tests, New Edition, 
1939. p. 3. 
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working on a rather comprehensive program of reading test con- 
struction. The results of this development, the Diagnostic Reading 
Tests,” are now available in two levels. The Upper Level, for use 
in Grades 7 through the college freshman year, is available in eight 
comparable forms. The survey, or general reading, section is designed 
to measure the student's rate of reading and comprehension of inter- 
esting story-type material with a relatively simple vocabulary load. 
The vocabulary section consists of 60 items from general vocabulary 
and from the fields of English, mathematics, science, and social 
studies. The comprehension section consists of four selections of 
content material of the type found in social studies and science text- 
books. 

The Kelley-Greene Reading Comprehension Test measures certain 
elements of the student's ability to read, comprehend, and remember 
the type of material he encounters in connection with his high-school 
course work. Test r consists of nine specially prepared paragraphs 
dealing with many different types of content. The sentences in each 
paragraph are numbered. The five comprehension exercises accom- 
panying each paragraph are to be answered by indicating on the 
separate answer sheet the number of the sentences that provide the 
answers. The accompanying extract representing Paragraph V from 
Form A-m of this test is presented here to illustrate the nature of 
these comprehension exercises. 


Excerpt from Kelley-Greene Reading Comprehension Test 2: 


PARAGRAPH V 


* When a piece of paper burns, it is completely changed. ? A part of 
it is turned into a hot gas. ? The ash that is left behind does not look 
like the original piece of paper. * When dull-red rust appears on a piece 
of tinware, it is quite different from the gleaming polished metal. ° The 
tarnish that forms on silverware exposed to the air is a new substance 
unlike the silver itself. * Animal tissue is unlike the vegetable substances 
from which it is made. * A change in which the original substance is 
turned into a different substance is called a chemical change. 


20 Frances O. Triggs, chairman, Diagnostic Reading Tests. Science Research 
Associates, Chicago, 1950. 

21 Victor H. Kelley and Harry A. Greene, Kelley-Greene Reading Comprehension 
Test. Published by World Book Co., 1952. 
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V 


1. The best topic sentence for this paragraph is sentence — 
i. 4 25 37.7 

2. Which key word or phrase tells best what this paragraph is about? 
т. transformation 


2. chemical changes 
3. new substances 


4. Which is the best generalization {тот this paragraph? 
i. Any change is known as а chemical change. 
2. The growth of animal tissue is à physical change. 
3. Chemical change involves forming a new substance from the 
original. 


4. Which of the following is an example of the type of change described 
in the paragraph? 
i. burning coal 
2. freezing an ice cube 
3. dissolving sugar in water 

5. What type of change takes place when grass eaten by a cow is 
turned into milk? 
т. physical 2. chemical 4. transformation 


Test 2 is based on three fairly long and factually-loaded articles 
dealing with science, social studies, and general content. Tn this test 
the 72 comprehension exercises are answered by indicating the 
number of the sentence which provides the answer to a given ques- 
tion. Test 3 is a test of the retention of details and facts given in the 
three articles read in Part 2. The 34 multiple-choice exercises are 
arranged in cycled order from each of the three articles. 

This test, while not in itself specifically diagnostic, affords results 
that should provide a sound basis for the use of such analytical in- 
struments as the Jowa Silent Reading Tests, the Diagnostic Reading 
Tests, or the Spitzer Study Skills Tests in connection with a remedi- 
ation program in these areas. 

The Spitzer Study Skills Test, because of its practical utility in 
many subject-matter fields and its use of a number of ingenious 
hould attract the attention of high-school and 


testing techniques, $ 
college teachers. This instrument is designed to measure the student's 


knowledge of and his skill in using reference sources in the prepa- 
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ration of his course assignments. The five skill fields tested are: (1) 
Using the Dictionary, (2) Using the Index, (3) Knowledge of Sources 
of Information, (4) Understanding Graphs, Tables, and Maps, and 
(5) Organization of Facts in Note Taking. The accompanying ex- 
cerpt from Test r is presented as an illustration of the manner in 
which the dictionary skills may be tested. 


Excerpt from Spitzer Study Skills Test 22 


Sample Dictionary 


han*som (hán'som), л. a low-hung, two-wheeled, covered vehicle 
for two passengers. 


haph*ta*rah (häf'tä rä’, hüf tór'á), n., pl. -roth (-roth/). a portion 
of the Prophets read immediately after a portion of the Pentateuch 
in the Jewish Synagogue on festivals. [t. Heb.: conclusion | 


har*ass (har’as, horas’), v.t. т. to trouble by repeated attacks, incur- 
Sions, etc., as in war or hostilities; harry; raid. 2. to disturb per- 
sistently; torment, as with troubles, cares, etc. [t. F: s. harasser, der. 
OF karer set a dog on] 


hu*mic (hü^mik), adj. Chem. of or denoting something (as an acid) 
derived from humus. [f. s. L humus ground, mould + -1с] 


Hy*pe*ri*on (hi pir’ion), n. Gk. Myth. r.a Titan, a son of Uranus 
and Gaea: the father of Helios, Selene, and Eos. 2. (later) Apollo 
[t. L, t. Gk.] 


il*lume (i 160m’), v.t., -lumed, -luming. Poetic. to illuminate. 


is (iz), v. зга pers. Sing. pres. indic. of be. [OE is, c. Icel. es, er; 
akin to G ist, Goth. ist, L est, Gk esti, Skt. astí. See BE.] 


jack" (jak), n. I.(cap.) a nickname for the name John. 2. a man or 
fellow. 3. (cap. or Lc.) a sailor. 4. any of various mechanical con- 
trivances or devices, as a contrivance for raising great weights small 
distances. 5. a device for turning a spit, etc. 6. U.S. any of the four 
knaves in playing cards. 


ka*bob (kobób^ ),". 1. (pl) an oriental dish consisting of small pieces 
of meat seasoned and roasted on a skewer. 2. Anglo-Indian. roast 
meat in general. Also, cabob. [t. Ar.: m. kabab] 


22 Herbert F, Spitzer, Spitzer Study Skills Test. Published by World Book Co., 
1953. 
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qui*pu (Кё^роо, kwip’60), n. (among the ancient Peruvians) a device 
consisting of a cord with knotted strings of various colors attached, 
for recording events, keeping accounts, etc. [t. Peruvian (Kechua): 
lit., knot] 


Ré:au:mur (ra’a myóór'), adj. designating, or in accordance with, 
the thermometric scale introduced by de Réaumur in which the freez- 
ing point of water is at 0°, and the boiling point at 80°. See illus. 
under thermometer. Abbr.: R. Also, Re'au* mur’, 


Sois*sons (зуга sôn’), n. a city in N France, on the Aisne river: 
battles, A.D. 486, 1918, 1944. 18,174 (1946). 

spark? (spürk), n. І. а gay, elegant, or showy young man. 2. a beau, 
lover, or suitor. —v.t. 3. Colloq. to pay attentions to (а woman); 
court. —v.i. 4. Colloq. to engage in courtship; be the beau or suitor. 
[either fig. use of sPARK'; or metathetic var. of sprack lively, t. 
Scand.; cf. Icel. sparkr sprightly | 

spit? (spit), n., v. spitted, spitting. —n. т. a sharply pointed, 
slender rod or bar for thrusting into or through and holding meat to 
be roasted at a fire or broiled. 2. any of various rods, pins, or the like 
used for particular purposes. 3. a narrow point of land projecting into 
the water. 4. a long, narrow shoal extending from the shore. —v.ż. 5. 
to pierce, stab, or transfix, as with a spit; impale on something sharp. 
6. to thrust a spit into or through. [ME; OE spitu, с. D and LG spit] 

truc*u*lent (trük/yo lont, tróo'kyo-), adj. fierce and cruel; brutally 
harsh; savagely threatening or bullying. [t. L: s. truculentus | —truc’ 
u*lence, truc/u*len*cy, л. —truc’uslent*ly, adv. —Syn. See 
fierce. 

KEY: b., blend of, blended; c., cognate with; d., dialect, dialectal; der., derived from; 

f., formed from; g., going back to; m., modification of; r., replacing; s., stem of; 

t., taken from; ?, perhaps. 

KEY: Act, able, dare, art; ébb, Equal; if, ice; hot, over, órder, oil, book, doze, out; 

йр, üse, ürge; ә — a in alone; ch, chief; g, give; ng, ring; sh, shoe; th, thin; &&, 

that; zh, vision. 


т. In the second pronunciation of harass, the sound of the first a is 


like the с in — 
1. chapter 2. cater 3. mature 4. marten 


2. The letters adj. after Réaumur in the dictionary indicate that — 


5. the spelling is archaic. 

6. the word comes from the French province of Aude. 
7. the word is an adverb. 

8. the word is an adjective. 
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3. The first syllable of Soissons rhymes with — 
т. plow 2. boy 3. we 4. la (the musical note) 


4. In what part of France is Soissons located? 
5. northern 6. southern 7. eastern 8. not given 


5. About how many people live in Soissons? 
I. 500 2. 2,000 3. 20,000 4. not given 


15. What spelling of the word humic was used when it first became an 


English word? 
т. humus 2. humic 3. humik 4. not given 


16. What is the Pentateuch used in defining haphtarah? 
5. a portion, of a Jewish ceremony 
6. a portion of the Old Testament 
7. à group of laments 
8. a part of the synagogue 


17. From what language did the word harass come? 
т. Greek 2. Latin 3. French 4. Anglo-Saxon 


18. What was the relationship between Hyperion and Gaea? 
5. Gaea was Hyperion's mother. 
6. They were the same character. 
7. Gaea was Hyperion's father. 
8. The relationship is not given. 


The Reading Comprehension, Cr, Test of the Cooperative English 
Test battery is an instrument of general diagnostic, as well as survey, 
functions. The test consists of parts on vocabulary and reading, and 
is so scored as to result in meaningful scores on vocabulary, speed 
of comprehension, and level of comprehension, as well as in a total 
score. The illustration of one of the reading paragraphs and two of 
the four test items that are based on it will indicate something of the 
method used in the measurement of reading comprehension. 


Excerpt from Cooperative Reading Comprehension, СІ, Test 2° 


Leave America divided into 13 or, if you please, into three or four 
independent governments—what armies could they raise or pay—what 


23 Frederick B. Davis, Cooperative English Test, Reading Comprehension, Ст, 
Form К. Published by Cooperative Test Service, 1941. 
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fleets could they ever hope to have? If one was attacked would the 
others fly to its ‘aid and spend their blood and money in its defense? 
Would there be no danger of their being flattered into neutrality by 
false promises? Although such conduct would not be wise, it would, 
nevertheless, be natural. 


15. The writer advocates 


15-1 a strong central government. 

15-2 a large fleet. 

15-3 American neutrality. 

15-4 а weak central government. 

15-5 natural conduct. 1...1... 0 660664 rg’) 


16. The passage states that it is natural for 
16-1 nations to help each other. 
16-2 nations to be deceived by: other nations. 
16-3 small nations to have weak armies and navies. 
16-4 nations to make false promises. 
16-5 American to be divided into small 
independent states. ............ ee ees Tla) 


The Diagnostic Examination of Silent Reading Abilities measures 
ten aspects of silent reading ability and furnishes a reading index 
which is a specialized type of derived score. The following list of the 
part scores obtained for the test indicates its highly diagnostic nature 
and its coverage of specific types of high-level work-study skills : 
(1) rate of comprehension, (2) perception of relations, ( 3) vocab- 
ulary in context, (4) vocabulary—words in isolation, (5) range of 
general information, (6) central thought, (7) clearly stated details, 
(8) interpretation, (9) integration of dispersed ideas, and (то) draw- 
ing inferences. From the various items of Part III, scores of the 
types listed in (6) to (то) above are obtained by means of the 
scoring method used. 

Among the other well-known reading tests, which cannot be il- 
lustrated here because of space limitations, are the California Read- 
ing Tests, the Gates Reading Survey for Grades 3 to 10, the Unit 
Scales of Attainment in Reading, the Nelson-Denny Reading Test, 
the Michigan Speed of Reading Test, the Traxler Silent Reading 
Test, the Traxler High School Reading Test, and the Schrammel- 
Gray High School and College Reading Test. 
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Measurement of vocabulary 


One of the major aims of instruction in all phases of English is 
the development of a rich and expressive vocabulary. While vo- 
cabulary tests are not necessarily classified as reading tests, the two 
types of tests are related in nature and function. Furthermore, many 
reading tests include parts on vocabulary, as in the Jowa Silent 
Reading Test, Advanced, and the Diagnostic Reading Tests for grade 
seven through the college freshman year. 

The Durost-Center Word Mastery Test,?* in addition to measuring 
the general vocabulary level of secondary-school pupils as a single 
aspect of reading comprehension, should provide a very useful 
supplement to other reading comprehension tests not sampling the 
vocabulary fields. In Part т of the test тоо carefully selected words 
are tested in multiple-choice form. Part 2 presents the same words 
in sentences with the same options as given in the first part. A com- 
parison of the two scores reveals the extent to which the student is 
able to acquire meanings of words from contextual situations. 

The Michigan Vocabulary Profile Test,?* which is standardized for 
use with secondary-school pupils and college freshmen, provides an 
extensive sampling into the following eight different technical vo- 
cabulary areas: Human Relations, Commerce, Government, Physical 
Sciences, Biological Sciences, Mathematics, Fine Arts, and Sports. 
The test consists of 240 items comprising a definition or description 
and four words or phrases, only one of which is completely and ac- 
curately defined or described. As a supplement to other general 
comprehension measures, this test should prove especially useful with 
high-school and college students. 

The Cole Teacher's Handbook of Technical Vocabulary lists those 
technical words in a number of secondary-school subjects that 
teachers have rated as most important. The words are the «thought 
elements" that are indispensable factors for thinking in the subjects 
represented. The author suggested that the lists may profitably be 
used as diagnostic tests in determining the specific weaknesses of a 
class or an individual pupil in technical vocabulary. Lists are now 
available for English composition, literature, foreign languages, 


.?* Walter N. Durost and Stella S. Center, Durost-Center Word Mastery Test. 
World Book Co., Yonkers, N. Y., 1952. 

25 Edward B. Greene, Michigan Vocabulary Profile Test. World Book Co., 
Yonkers, N. Y., 1949. 
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arithmetic, algebra, plane geometry, general science, biology, physics, 
chemistry, geography, American history, and hygiene. 

Other vocabulary tests are the Cooperative Vocabulary Test, the 
Markham English Vocabulary Test, the Holley Sentence Vocabulary 
Scale, and the Seashore English Recognition Vocabulary Test. 


4 MEASUREMENT IN LITERATURE 


Tests of literary acquaintance and comprehension 


The measurement of outcomes in literature has quite largely been 
confined to the informations and skills resulting from instruction, 
since these objective features are easily observed in pupil behavior. 
Among the tests that measure acquaintance with literature are the 
Cooperative Literary Acquaintance Test, the Analytical Scales of 
Attainment in Literature, the Smith-Bixler Awareness Test in 20th 
Century Literature, and the Satterfield Objective Tests in English. 
These tests are primarily concerned with the informations that 
should result from the study of secondary-school literature. A second 
group of tests, which includes the Cooperative Literary Compre- 
hension Test, the Abbott-Trabue Exercises in Judging Poetry, and the 
Van Wagenen Reading Scales in Literature, is concerned mainly with 
those higher forms of reading skills that should develop from the 
study of literature. Neither the acquaintance nor the comprehension 
tests of these types are intended for the measurement of the relatively 
intangible appreciative outcomes of instruction. 


Tests of literary appreciation 


Although ability to comprehend literature seems to be an essential 
prerequisite to literary appreciation, adequate comprehension is not 
necessarily accompanied by appreciations. However, it seems justifi- 
able to conclude that the manner by which literary appreciation has 
been approached in most standardized tests in this field is mainly 
through a higher form of comprehension. Several tests that attempt 
to measure the appreciative aspects of literature are the Logasa- 
Wright Tests for the A ppreciation of Literature, the Carroll Prose 
Appreciation Test, the Cook-Bixler Literature Appreciation Tests, 
the Rigg Poetry Judgment Test, and the Cooperative Literary Com- 


prehension and Appreciation Test. 
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Б CORRECTIVE EXERCISES IN READING 


Remedial drills for oral reading difficulties 


Dearborn, Huey, Gray, Buswell, and many other early investiga- 
tors, studying the problem of how to improve reading, noted that 
there is a marked relationship between the rate and quality of read- 
ing and the control the individuals have over their eye movements 
in reading. The meaning of eye movements may be readily under- 
stood by anyone who will take a position closely in front of and 
directly in the range of vision of a person engaged in reading. The 
observer will note that the reader's eyes do not move regularly and 
systematically forward as the reading progresses but that the move- 
ments are interspersed with pauses or fixation periods. It is during 
these pauses that the images ofi the words or groups of words are 
secured. Carefully conducted laboratory experiments reveal the fact 
that good readers make longer sweeps with the eyes, take in larger 
units of words, pause for much shorter periods, and rarely retrace 
material once covered. Gates ?* concluded that improper eye move- 
ments are probably the evidences of other types of reading disability 
that can be treated specifically. With the removal of the causes lying 
back of ineffective eye movements, the treatment of eye movements 
as such becomes unnecessary. On the whole this seems to be the most 
hopeful way of looking at the problem, since a great many teachers 
are qualified to administer types of remedial treatments that may be 
applied in the classroom but only a few have the technique or equip- 
ment for training the pupil in more effective eye movements. Gray 
himself recognized this practical aspect of a problem of remedial in- 
struction in reading and suggested a number of excellent exercises 
designed to overcome specific difficulties in reading, many of which 
are indicated by the ineffectual eye movements of the pupil. One of 
these exercises is’ reproduced here to illustrate types of material 
adapted to remedying certain oral reading difficulties. 


2° Arthur I. Gates, The Improvement of Reading: A Program of Diagnostic and 
Remedial Methods, 'Third edition. Macmillan Co., New York, 1947. p. 444. 
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Exercise To Increase Accuracy or RECOGNITION 27 


1. Words which a pupil failed to recognize accurately while reading 
were used in sentences at the end of each period, in order that he 
might associate them with their meaning. The words which repeatedly 
caused difficulty were then typewritten on cards and used in quick- 
perception drills, by presenting them as rapidly as they were recog- 
nized. Such words as again, want, been, does, and heard were fre- 
quently emphasized. As soon as a pupil was able to recognize a word 
readily, drill on it was discontinued. New words were added to the 
list as difficulties were encountered. 

2. Words which a pupil confused because of their similarity in form were 
emphasized in drill exercises. These words included such groups as 
thought, though, and through, there and where, then and when, now 
and how, and has, had, and have. The words were used in sentences 
before they were presented in quick-perception drills. If unusual 
difficulties were encountered, words which were similar in form were 
presented together so that their differences could be studied. 


3. Pupils who recognized isolated words accurately frequently made 
errors in recognizing the same words in phrases and sentences. In 
order to overcome this difficulty a word, such as there, was written 
on the board in several phrases or short sentences and the pupil was 
given opportunity to study them deliberately. As soon as he was able 
to recognize these phrases readily they were typewritten on cards 
and presented in quick-perception drills. 


By practice in the use of such diagnostic reading devices as these 
and by training. themselves in careful observation of their pupils, 
teachers can become adept in the detection of particular reading 
difficulties. Furthermore, they will soon find that with practice they 
can become proficient in the art of building drill exercises. The sample 
exercises selected from a great many suggested by Gray, Gates, and 
others will be found valuable guides in the preparation of such mate- 
rials. By following the examples given here, the teacher can be prac- 
tically certain that he is using reading drill material whose Secrecy 
has been experimentally established. 


27 W, S. Gray, Remedial Cases in Reading: Their Diagnosis and Treatment. Sup- 
plementary Educational Monograph No. 22. University of Chicago Press, Chicago, 


1922 
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Remedial drills for work-study types of reading difficulties 


Possibly one reason for the rather marked instructional emphasis 
on the work-study type of reading to the exclusion of reading of the 
leisure types is that it is reasonable to expect a considerable carry- 
over of skills from work-type reading to the other type, because of 
the large number of common skills involved. For example, skill in 
recognition of word meaning, which functions in work-study reading, 
is probably similarly effective when the individual is reading solely 
for pleasure. 

A few illustrations of specific types of remedial exercises suited 
for use in silent reading of the work-study types are presented in the 
following pages in the hope that they may serve as a guide to teachers 
interested in the development of material of this type. Only a few 
samples from each field can be furnished. 

Word recognition. Exercises designed to develop skill in the recog- 
nition of new word meanings appear in a great many forms, as for 
example: (т) simple sentence completion; (2) agent-action; (3) 
action-agent; (4) action-effect; (5) effect-action; (6) identification; 
(7) opposites; (8) similars; (9) description; and (то) phrasing. 

Location of information in books. A significant factor in the child's 
use of reading for work-study periods is his ability to locate informa- 
tion in books. The following suggestions may prove helpful: 


1. To develop in pupils an ability to use the index, children should: 
(a) be taught the alphabet; (b) be drilled in arranging words in 
alphabetical order; (c) be drilled in finding answers to questions by 
use of the index; (d) be asked to prepare indexes for books not pro- 
vided with them. 

2. To develop the ability to use a table of contents, pupils should: (a) 
be assigned lessons by topic or titles; (b) find the assigned lessons 
in the text by means of the table of contents; (c) find additional 
sources of information on the assignment in the library. 


Organization of material. The ability to organize what is read is a 
necessary part of the equipment of everyone who expects to become а 
good student. Organization of reading materials calls for a superior 
type of judgment. The following suggestions will aid teachers in 
developing a variety of types of practice in organization : 


I. Practice in deciding upon the main thought in the paragraph or topic. 
2. Drillin outlining a study, an assignment, a reference reading, a poem. 
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3. Practice in analyzing the organization of selections. 

4. Practice in restating the substance of a difficult passage to convey the 
same idea in simplified form. 

5. Practice in selecting the most appropriate title for a selection. 


Other remedial materials in reading 


The recent analytical work of Betts, Durrell, Gates, Gray, Harris, 
Stroud, and others opens up great possibilities for diagnostic and 
corrective work. No longer need diagnosis in reading be confined to 
such vague and general qualities as rate and comprehension of word 
meanings. In fact, it now becomes quite apparent that many of the 
so-called diagnostic tests in this field are not at all suited for the 
specific types of diagnosis required in the identification of reading 
disabilities. While considerable progress has been made in the last 
decade in the more exact analysis and identification of underlying 
causes of reading disability, the development of adequate initial in- 
structional materials and corrective devices has not kept pace with 
the analytical work. The next decade is almost certain to see much 
progress along these lines. 

Commercial materials designed for instructional testing purposes 
and remedial uses in reading are available in many different forms. 
At present there are almost countless drill books and workbooks 
having for their purpose the development of silent reading skills of 
the work-study type. However, this material has mainly emphasized 
the problems of teaching beginners to read by some particular method 
rather than that of providing corrective treatment for some basic 


disability. 


Topics for Discussion 
т. State your idea of the relative importance of the receptive and the 
expressive language arts skills. Would human society be able to 
dispense with either? 
2. Evaluate the school importance of listening as a receptive language 
skill area. Is listening as important socially as silent reading abilities 


or the study skills? Defend your position. 
Why would you expect reading disability to be reflected in class- 


room achievement? 
In what specific ways does modern life place a particular burden on 


the ability to read rapidly and well? 
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5. What are the essential differences between reading for appreciation, 
as in literature, and reading for acquisition of facts, as in the study 
of the content subjects? 

6. In what high-school subjects would you expect achievement to be 
definitely affected by reading disabilities? Why? 

7. What emphasis should oral reading be given in high school? Where 
does it belong, in English or in speech training? 

8. Are improper eye movements in reading causes of reading defi- 
ciencies or merely evidences of such defects? 

9. From your reading and observation, make a list of the major read- 
ing defects as encountered among high-school pupils. 

10. Prepare sets of remedial exercises suitable for high-school use fol- 
lowing out two of the suggestions for drills on organization diffi- 
culties (see pages 416 and 417). 

ir. To what extent is literary appreciation due to information? Do 
people like the things they know about, and therefore appreciate 
the things about which they are well-informed? 

12. Check the list of Essential Skills Involved in Work-Study Types of 
Reading against the skills specified for measurement in the /owa 
Silent Reading Tests. 

13. Study the type of test shown on pages 408 to 410 and prepare a 
set of suggestive drill exercises for the development of skill of this 
type. 

14.. Prepare suggestions for the increase in the pupil’s span of attention 
in reading. 

rs. Is there any special reason to assume that remedial practices that 
are helpful in curing oral reading defects may not also be useful in 
correcting silent reading deficiencies? 


Selected References 


ANDERSON, InviNG H., AND DEARBORN, WALTER Е. The Psychology of 
Teaching Reading. New York: Ronald Press Co., 1952. 

Brrrs, Emmett A. Foundations of Reading Instruction with Emphasis 
on Differential Guidance. New York: American Book Co., 1946. 

Bram, GLENN M. Diagnostic and Remedial Teaching in Secondary 
Schools. New York: Macmillan Co., 1947. 

Brown, James I. “The Construction of a Diagnostic Test of Listening 
Comprehension.” Journal of Experimental Education, 18:139-46; 
December 1949. 

Brown, JAmes I. Efficient Reading. Boston: D. C. Heath and Co., 
1952. 


RECEPTIVE LANGUAGE ARTS 419 


Bunos, Oscar K., editor. The Fourth Mental Measurements Yearbook. 
Highland Park, N. J.: Gryphon Press, 1953. p. 317-23, 333-35 567- 
620. 

Buros, Oscar K., editor. The Nineteen Forty Mental Measurements 
Yearbook. Highland Park, N. J.: Mental Measurements Yearbook, 
1941. p. 140-43, 336-79. 

Bunos, Oscar K., editor. The Nineteen Thirty Eight Mental Measure- 
ments Yearbook. New Brunswick, N. J.: Rutgers University Press, 
1938. p. 79-80, 124-40, 155-56. 

Buros, Oscar K., editor. The Third Mental Measurements Yearbook. 
New Brunswick, N. J.: Rutgers University Press, 1949. p. 247-57, 
501-71. 

"елт, Ара B., AND SEAMANS, ALBERT. “Group Remedial Reading in 
High School.” English Journal (High School Edition), 26:355-62; 
May 1937. 

Dorcu, ЕругАвр W. “Testing Reading with a Book.” Elementary Eng- 
lish, 28:124-25; March 1951. 

Durrett, ромлар D. Improvement of Basic Reading Abilities. Yonkers, 
N. Y.: World Book Co., 1940. 

EBERHART, W. “The Teaching of Literature: An Approach to Evalua- 
tion.” Educational Research Bulletin, 17:1-6; January 19, 1938. 
FixcH, F. H., AND GILLENWATER, У. W. *Reading Achievement Then 

and Now.” Elementary School Journal, 49:446-54; April 1949. 

GATES, ARTHUR I. The Improvement of Reading: A Program of Diag- 
nostic and Remedial Methods. Third edition. New York: Macmillan 
Co., 1947. Chapter 3. 

GATES, ARTHUR I. “The Measurement and Evaluation of Achievement in 
Reading.” The Teaching of Reading: A Second Report. Thirty-Sixth 
Yearbook of the National Society for the Study of Education, Part I. 
Bloomington, Ill.: Public School Publishing Co., 1937. Chapter 12. 

GERBERICH, J. Raymonp. “The First of the Three R’s.” Phi Delta 
Kappan, 33:345-49; March 1952. 

Gray, Wittram S. “Contributions of Research to Special Methods: 
Reading.” The Scientific Movement in Education. Thirty-Seventh 
Yearbook of the National Society for the Study of Education, Part II. 
Bloomington, Ill.: Public School Publishing Co., 1938. Chapter 7. 

Gray, WILLIAM S. “Reading.” Encyclopedia of Educational Research. 
Revised edition. New York: Macmillan Co., 1950. p. 965-1005. 

Gray, WinLiAM S., HORSMAN, Gwen, AND MoNRoEÉ, Marion. Basic 
Reading Skills for High School Use. Chicago: Scott, Foresman and 
Co., 1948. 

Gray, WILLIAM S., chairman. The Teaching of Reading: A Second Re- 
port. Thirty-Sixth Yearbook of the National Society for the Study of 


420 THE SECONDARY SCHOOL 


Education, Part I. Bloomington, Ill.: Public School Publishing Co., 
1937. 

GREENE, Harry A., AND Gray, WiLLIAM 5. “The Measurement of 
Understanding in the Language Arts.” The Measurement of Under- 
standing. Forty-Fifth Yearbook of the National Society for the Study 
of Education, Part I. Chicago: University of Chicago Press, 1946. 
p. 189-200. 

Harris, ALBERT J. How To Increase Reading Ability: A Guide to In- 
dividualized and Remedial Methods. Second edition. New York: 
Longmans, Green and Co., 1947. 

Jorpan, A. M. Measurement in Education. New York: McGraw-Hill 
Book Co., Inc., 1953. Chapter 5. 

KELLEY, Victor H., AND GREENE, Harry A. Better Reading and Study 
Habits. Yonkers, N. Y.: World Book Co., 1947. 

LESTER, Јонм A., AND Linpguist, E. Е. “Examinations in English.” 
The Construction and Use of Achievement Examinations. Boston: 
Houghton Mifflin Co., 1936. p. 381-410. 

MoNnoE, Marton. “Diagnosis and Treatment of Reading Disabilities.” 
Educational Diagnosis. Thirty-Fourth Yearbook of the National So- 
ciety for the Study of Education. Bloomington, Ill: Public School 
Publishing Co., 1935. Chapter 12. 

Monroe, MARION, AND OTHERS. “Diagnostic and Remedial Procedures 
in Reading.” Educational Record, 19:105-13, Supplement No. 11; 
January 1938. 

Monroe, Marion, AND OTHERS. Remedial Reading. Boston: Houghton 
Mifflin Co., 1937. 

Morse, Horace T., AnD McCune, Grorce H. Selected Items for the 
Testing of Study Skills. National Council for the Social Studies, 
Bulletin No. 15. Washington, D. C.: National Education Association, 
September 1940. 

Poorrv, Ковевт С. “English—Literature.” Encyclopedia of Educa- 
tional Research. Revised edition. New York: Macmillan Co., 1950. 
р. 396-403. 

Reep, LuLu R. “A Test of Students’ Competence to Use the Library.” 
Library Quarterly, 8:236-83; April 1938. 

Ropinson, Francis P. Effective Study. New York: Harper and 
Brothers, 1946. 

SmirH, Dora V. “Recent Procedures in the Evaluation of Programs in 
English.” Journal of Educational Research, 38:262-75; December 
1944. 

Spacue, GEORGE. “The Construction and Validation of а Work-Type 
Auditory Comprehension Reading Test.” Educational and Psycho- 
logical Measurement, 10:249-53; Summer 1950. 


RECEPTIVE LANGUAGE ARTS 421 


Srroup, J. B., амо Ammons, Ковквт B. Improving Reading Ability. 
New York: Appleton-Century-Crofts, Inc., 1949. 

THORNDIKE, Enwanp L. A Teacher's Word Book of the Twenty Thou- 
sand Words Found Most Frequently and Widely in General Reading 
for Children and. Young People. New York: Bureau of Publications, 
Teachers College, Columbia University, 1931. 

TRAXLER, ARTHUR E. “Measurement in the Field of Reading." English 
Journal, 38:143-49; March 1949. 

Wirty, Paur. “Approach to Better Reading: An Evaluation." Educa- 
tional Administration and Supervision, 25:81-92; February 1939. 

Wirry, PAUL A., AND Kopet, Davin. “Evaluating Reading and Remedial 
Reading.” English Journal, 26:449-58; June 1937. 

Wirry, PAUL A., AND Koper, Рау. “Preventing Reading Disability: 
The Reading Readiness Factor.” Educational Administration and 
Supervision, 22:401-18; September 1936. 

Woop, Ben D., AND Harrner, Карн. Measuring and Guiding Indi- 
vidual Growth. New York: Silver Burdett Co., 1948. p. 285-91. 


16 


Measuring and Evaluating in the 
Expressive Language Arts 


THIS CHAPTER presents a summary of the following points involved 
in the measurement of English and related subjects: 


Educational and social significance of English. 
Specific aims and outcomes of instruction in English. 
Specific language skills, oral and written. 
Measurement of oral English skills. 

Analysis of mechanics of written composition. 
Diagnosis and analysis of language usages. 
Measurement and evaluation in spelling. 

H. Measurement and evaluation in handwriting. 


оинрорр 


The expressive language arts, consisting of English, grammar, 
speech, spelling, and handwriting, are discussed in this chapter. 
These, together with reading, listening, and the work-study skills 
discussed in the preceding chapter, round out the English language 
arts subjects ordinarily stressed in the school program. 


] AIMS AND OUTCOMES OF INSTRUCTION IN ENGLISH 


The educational and social importance of English 


Mastery of the tools of expression and communication as repre- 
sented by such school subjects as composition, language usage, 
grammar, spelling, and handwriting unquestionably constitutes the 
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most fundamental of all school outcomes. The fact that instruction 
in each of these fields begins with the child's first days of school and 
in some form continues to receive needed emphasis throughout the 
individual's life may be taken as the best evidence of its educational 
and social importance. The fact that the individual rarely achieves 
adequate mastery over these important expressional skills demon- 
strates their great complexity and difficulty of learning. It is signifi- 
cant that English, which was one of the first subjects to be measured, 
is one of the slowest to respond to analysis, diagnosis, and remedial 
treatment. Possibly this is due in part to the formal methods of 
instruction in this subject followed by many teachers. It is more 
likely to be due, however, to the sheer complexity of the subject 
itself, and the many forms in which it expresses itself. 

English is a distinctly social subject. The language skills function 
to make communication pleasant and possible. Accordingly, the 
activities of the English classroom must provide for social conditions 
under which communication can take place. The pupil must have 
something to transmit. He must write or speak to someone for a 
definite purpose. Whether he wishes to inform, to convince, to inspire, 
or merely to entertain, he must have a reasonable mastery of such 
tool skills of expression as sentence sense, language usage, mechanics 
of writing, and spelling. 

Viewed in this way, English is much more than the specific lan- 
guage skills utilized in recording and transmitting ideas. It involves 
the skills required for the acquisition of ideas, either through the ear 
or through the eye, the use of higher mental centers for the critical 
classification and evaluation of the ideas, and also the more com- 
monly known language skills. Thus, there are English skills that are 
definitely receptive and others that are as definitely productive. Eng- 
lish delves into many different subject fields, since it has no real 
content of its own. This view of English makes very definite tech- 
nical demands and at the same time assumes an unusually rich 
cultural background on the part of the English teacher. 

In this discussion, language skill is considered to mean facility in 
the use of the proper language habits and forms essential to effective 
intercommunication at a particular cultural level. Such a point of 
view makes reasonably clear the problems of the teacher of English. 
Language skills arise, as do other specific skills, through the proper 
exercise of the desired habits. Obviously, proper exercise is possible 
only when proper identification of the habits has taken place. Hence 
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the habits upon which language skill depends must be identified, 
and much carefully constructed instructional and drill material 
must be provided. This in itself will serve two useful purposes. 
First, the use of good language drill material insures that the 
pupil will have experience in making the correct response to selected 
language situations either with or without the assistance of such 
formal grammar instruction as may be applied. Second, the use of 
such material sets up in the pupil's mind an attitude toward language 
error. А definite consciousness toward errors in English usage must 
be developed. 


Analysis of language skills 


It must be evident from what has been presented in earlier chapters 
in this book that an accurate analysis of the underlying skills in 
language is necessary before any significant program of diagnostic 
and corrective work can be undertaken. In the past, certain general 
language abilities have been identified for measurement purposes, 
such as language usage, grammar, and composition. In more recent 
years, however, there has been an effort to reduce language in general 
to its more elementary or basic skills. 

Тһе task of clarifying the statements and purposes of English 
instruction and of analyzing and identifying the basic language 
skills is not one that will be successfully accomplished by any 
one person. At the outset it must be recognized that there are 
many conditions under which language functions. There is un- 
doubtedly a language of impression, or comprehension, as well as а 
language of expression. The former is the aspect of language that is 
given particular attention in reading instruction. The latter, the 
language of expression, is the phase usually meant by the term 
“language ability,” and is the phase that receives special attention in 
language instruction. 

The program of instruction in English expression must equip the 
child to engage successfully in certain speaking and writing ac- 
tivities wherever he may encounter them, either in school or out of 
school. In a discussion of the program in language arts, McKee * 


1 Paul McKee, “An Adequate Program in the Language Arts.” Teaching Lan- 
guage in the Elementary School. Forty-Third Yearbook of the National Society for 
the Study of Education, Part II. Department of Education, University of Chicago; 
Chicago, 1944. Chapter 2. 
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pointed out that the school is responsible for providing definite in- 
struction and experiences in each of the following important speak- 
ing and writing situations encountered in school and in life outside 
the school: (1) taking part in conversation and discussions, (2) using 
the telephone, (3) taking part in meetings, (4) giving reports, (5) 
telling and writing stories, (6) giving reviews and reports, (7) giving 
directions and explanations, (8) making announcements, (9) giving 
descriptions, and (10) writing letters. Participation in these language 
activities means that the child must make use of a great many 
specific abilities, understandings, skills, and attitudes, which, from 
the language instructional point of view, represent the objectives 
of the language curriculum. 

The accompanying outline of language outcomes and objectives is 
a compilation and an adaptation of material from several sources. 
To develop a complete and perfect outline of language objectives is 
almost certainly a hopeless task. In spite of certain logical and 
psychological shortcomings which this outline may possess, it never- 
theless gives the teacher helpful suggestions for the identification of 
useful language skills. The teacher and the student will, of course, 
wish to revise such an outline from time to time to keep it in line 
with the best research evidence in the field. 


LANGUAGE OUTCOMES AND OBJECTIVES * 


J. Oral Language 
A. General Outcomes of Oral Language 
т. To form correct habits of articulation, enunciation 
2. To assume proper and pleasing body position and man- 
nerisms when speaking 


2 Adapted from the following sources: (1) Maude McBroom, The Course of 
Study in Written Composition for the Elementary Grades. University of Iowa 
Monographs in Education, First Series, No. то. University of Iowa, Iowa City, 
1928. (2) Тоша Elementary Teachers’ Handbook: Oral and Written Language. 
State Department of Public Instruction, Des Moines, 1944. (3) Harry А. Greene 
and H. L. Ballenger, Manual of Instructions: Iowa Language Abilities Tests. World 
Book Co., Yonkers, N. Y., 1948. (4) Paul McKee, Language in the Elementary 
School. Houghton Mifflin Co., Boston, 1939. (5) M. R. Trabue, chairman, Teaching 
Language in the Elementary School. Forty-Third Yearbook of the National Society 
for the Study of Education, Part IL. Department of Education, University of 
Chicago, Chicago, 1944. (6) Harry A. Greene and William S. Gray, “The Measure- 
ment of Understanding in the Language Arts.” The Measurement of Understand- 
ing. Forty-Fifth Yearbook of the National Society for the Study of Education, 
Part I. University of Chicago Press, Chicago, 1946. p. 176-89. 
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To use common courtesy in social groups 

To speak with feeling, reflecting meaning, thought, and 
interest 

To learn how to locate and give information 

To think while speaking 

To develop sentence sense 

To pronounce correctly such words as are used 

To learn how to acquire new words 

To speak at a rate suitable to conditions 

To use voice of suitable clarity and loudness 

To listen courteously and critically 

To recognize listening as an essential part of every con- 
versational situation 

To develop judgment of what is suitable to talk about in 
conversational groups 


B. Special Oral Language Outcomes and Situations 


I. 


DUROS 


9. 
IO. 


To relate anecdotes and incidents interestingly 

To make necessary simple announcements 

To participate with ease in conversation 

To take an active part in arguments and debates 

To learn to disagree or argue courteously 

To learn to listen, summarize, and report activities, 
events, news items, instructions 

To learn to react properly to social responsibilities, as an 
introduction, meeting a stranger 

To develop ability to assume an active part in school 
activities, as committee meetings, associations, classroom 
dramatizations, plays 

To learn to take the proper auditor-speaker attitudes 

To use the telephone properly 


П. Written Language 


A. General Outcomes, Knowledges, and Skills Peculiar to Written 
Composition 


I. 
2. 


To answer letters promptly 

To develop judgment of the suitability of content for spe- 
cial situations, such as friendly letters, business letters, 
letters of sympathy 

To learn to use correct form in writing business and social 
letters, notes, invitations 

'To learn to fill in common forms, blanks 

To acquire skill in writing notices, announcements, and 
advertisements, telegrams 


IO. 
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To show interest and skill in doing creative writing, such 
as stories, plays, editorials, diaries 

To acquire skill in making outlines from content material 
To record minutes of meetings, dictations by teacher 

To acquire skill in evaluating, organizing class and lecture 
notes 

To prepare an accurate and comprehensive bibliography 


B. Knowledges and Outcomes Peculiar to АП Written Work 


I. 


Sapos 


To write legibly and rapidly 

To spell correctly socially useful forms 
To utilize proper manuscript forms 

To use proper outline forms 

To punctuate written work correctly 
To capitalize correctly 


III. Knowledges and Skills Common to Both Oral and Written Lan- 


guage 


A, General Outcomes 


I, 


5. 
6. 


To develop a sincere desire to speak and to write cor- 
rectly 

To develop a sensitivity to error in speaking and writing 
To learn to use sources of information, as dictionary, 
encyclopedias, reference books 

To develop skill in producing variety in sentence struc- 
ture 

To identify types of sentences so that voice and punctu- 
ation can clearly indicate meaning 

To learn to expand the meaning vocabulary 


B. Correct Usages 


I. 


To master the most important grammatical usages, as pro- 
nouns, verb forms, subject-predicate relationships, redun- 
dancy, double negatives, antecedents 


C. Rhetorical Skills 


I. 
2. 


3. 


То develop sentence sense 

To develop variety in sentence structure 

То avoid faults in sentence structure, as useless intro- 
ductory words, phrases, loose use of connectives 

To organize ideas in sentences so that the sentence says 
exactly what is meant 

To organize ideas and sentences in a paragraph around 
a single topic 
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6. To organize ideas in a paragraph in proper sequence 

7. To avoid use of overworked words 

8. To develop through use a vocabulary rich in color, ac- 
curacy, conciseness, suitability, variety 

9. To stimulate interest through use of contrasts, concrete- 
ness, variety, simile-metaphors 


Oral language skills 


The foregoing catalogue of outcomes indicates that language as a 
means of verbal expression appears in two main forms, oral and 
written. Success in the use of oral language depends in the first place 
on the ability of the speaker to so choose, arrange, and enunciate his 
words as to affect his hearers as he intends. In order to guarantee 
success in the operation of these skills, the pupil must be given 
training and practice in thinking and talking under audience condi- 
tions. In this training, emphasis must be placed on the development 
of a pleasant speaking voice, a gracious attitude, a clear enunciation 
of words, an avoidance of common language errors, care in the se- 
lection of words, a careful selection and organization of ideas, and 
skill in the clothing of his thoughts in the proper words so that he 
may affect his hearers as he intends by leading their thinking along 
prescribed channels. 

It is equally imperative that attention be given to the development 
of proper skills and attitudes on the part of the listener. Effective 
audience-speaker reactions are the result of an interplay of factors 
arising from the fact that (т) something of value is being com- 
municated, (2) between persons appreciative of the values, but (3) 
possessing them at different levels of control. Thus the essential 
elements of the audience situation are present. The speaker has a 
message. The audience is ready to listen courteously and critically 
because useful and interesting information is being communicated. 


Written language skills 


The problem of written language takes a threefold form, although 
this is not apparent from the outline of outcomes. The first involves 
the formal or mechanical factors, such as writing, spelling, punc- 
tuation, form, and general appearance. The second treats of certain 
grammatical factors, such as common errors in language form and 
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sentence structure and form. The third is concerned with the more 
subtle elements of composition, the rhetorical factors involving the 
questions of choice of words, quality of interest innate in the material, 
and logical organization of the subject matter both within the sen- 
tence and the larger units. In the first two phases of the problem of 
written language, the factors are more generally uniform in their 
manner of affecting readers. However, there is greater difficulty in pre- 
dicting the effect of the third phase on the reader. These mechanical 
and grammatical elements constitute in a way the raw material of 
written expression. The rhetorical factors are the results of the 
manner in which these raw materials are put together. They are the 
factors that make for appeal, originality, style, and distinctiveness in 
written expression. The mechanical and grammatical factors are 
relatively tangible, objective, and measurable. The rhetorical factors 
are more intangible, more difficult to identify and to measure, and 
thus far some of these elements have eluded the best efforts to 
measure them objectively. 


2 MEASUREMENT AND DIAGNOSIS OF LANGUAGE ABILITIES 


Oral language scales 


An examination of the foregoing outline of language outcomes 
makes it clear that oral language ability is made up of many related 
general and specific abilities. It is also equally obvious that from the 
standpoint of its social utility oral language is extremely important. 
Yet measurement of oral language abilities is strangely limited. In 
fact, so far as the writers know, there are no adequate standardized 
instruments for the measurement of oral composition that will stand 
inspection in the light of present-day criteria. Some progress has been 
made in the evaluation of techniques for measuring the improvement 
in oral composition, but thus far no practical way of making the 
results available to the classroom teacher has been devised. 


Diagnosis of oral language disabilities 


Considerable progress in the identification of oral language dis- 
abilities and in the development of remedial procedures in oral ex- 
pression has been made by investigators in the field of speech. It is 
clear, however, that any classification of speech disorders must 
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necessarily be conditioned by individual points of view. For examp 
Blanton ? recognized four fundamental speech disorders: (1) dela 

speech, (2) oral inactivities, (3) letter substitutions, and (4) stutter. 7 
ing. On the other hand, Mulgrave treated the problems of speech ` 
pathology under the three main headings indicated in the accom- 
panying outline, which is adapted from three chapters in her text- | 
book, Speech for the Classroom Teacher. i 


PROBLEMS OF SPEECH PATHOLOGY 


I. Functional Speech Disorders 


A. Baby talk (infantile speech sounds and substitutions) 
B. Defective phonation (faulty production of speech sounds) 
1. Inorganic lisping (impure production of sibilant sounds) 
2. Lingual protrusion (misplacement of tongue) : 
3. Lateral emission (due to formation of teeth and tongue | 
placement) j 
4. Nasal emission (poor control of soft palate causing sound ў 
to be emitted through the nose) 


C. Vulgar speech 
I. Foreign accent (sound omissions, substitutions, intonations 
due to influence of a foreign language) 
2. Regional dialects (conspicuous speech deviation that labels 
speaker geographically) 


II. Organic Speech Disorders 

A. Organic lisping (due to malformation of jaws or to failure of 
jaws to meet properly) 
Tongue-tie (movement of the tongue impeded) 
Cleft palate (defective palate or roof of mouth) ` 
Chronic hoarseness of voice (may be due to pathological con-  : 
dition, misuse, or to a neurotic condition) | 
Nasality (caused by too large a proportion of nasal resonance) 
Denasality (too little nasal resonance resulting from chronic 
catarrh, sinus infection, adenoids) 


зы Dow 


5 Smiley Blanton, “Problems and Methods in the Correction of Defective Speech.” - 
Speech Training and Public Speaking for Secondary Schools, Report of special com- 
mittee of the National Association of Teachers of Speech. Century Co., New York, 
1925. 

4 Dorothy I. Mulgrave, Speech for the Classroom Teacher, Revised edition. 
Prentice-Hall, Inc., New York, 1946. 
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III. Emotional Disorders 
A. Stammering (any habitual hesitation or repetition in forming 
speech sounds) 
B. Neurotic lisping (persists because individual desires to keep 
it in spite of lack of physical cause) 
C. Neurotic hoarse voice (may be due to nervousness or hysteria) 


Travis 5 preferred to group speech disorders under the three follow- 
ing heads: (1) disorders of rhythm in verbal expression, (2) disorders 
of articulation and vocalization, and (3) disorders of symbolic for- 
mulation and expression. 

Disorders of rhythm in verbal expression. This group of speech 
disorders includes stammering and stuttering, which Travis con- 
sidered basically similar. While the number of serious cases of stutter- 
ing is not actually very great, the effect on the individual is so serious 
that it is important for teachers to have some idea of the nature and 
extent of this disorder, Careful surveys indicate that approximately 
one pupil per hundred will be a stutterer, with the boys far out- 
numbering the girls in this speech handicap. Apparently there is no 
very definite relationship between stuttering and the mental ability 
of the pupil. 

Since the classroom teacher, no matter how great may be his inter- 
est in the nature and the causes of stuttering, can do very little about 
it, the important thing in connection with instruction in oral language 
is for him to develop the proper understanding of and sympathy for 
the stutterer’s outlook on life. 

Disorders of articulation and phonation, Normal speech implies 
the existence of adequate speech equipment in the physical sense 
capable of responding to the proper stimuli. The production of 
speech sounds calls for the most accurate coordination of the physical 
and mental aspects of the speech mechanism. 

Under this category of disorders of articulation and phonation are 
classified all of the defects found in enunciation and voice production, 
including delayed development of speech. Travis ° pointed out that 
in this field there are two types of speech defects, (1) functional 
defects that are due to bad training, and (2) organic defects that come 
from injuries or from faulty or abnormal development of the brain or 


5 Lee E. Travis, Speech Pathology, D. Appleton-Century Co., Inc., New York, 


1931. p. 37. 
6 Ibid. p. 37-38, 196, 211. 
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other organs related to speech. Many of this particular class of speech 
defects arise from such organic difficulties as abnormal development 
of the tongue, cleft palate and harelip, abnormal development of the 
jaws and teeth, adenoids, and defective hearing. 

The treatment for most of these disorders of articulation and 
phonation involves medical, mental, hearing, and speech examina- 
tions. Since these are generally highly technical in character, they 
should probably be undertaken only by the trained specialist in each 
field. 

Disorders of symbolic formulation and expression. Travis * defined 
disorders of symbolic formulation and expression as consisting es- 
sentially of “а lack of power to execute with ease acts connected with 
articulated speech and the comprehension of spoken words." The 
location of these defects is largely a clinical rather than a classroom 
problem. Accordingly, the teacher, upon the discovery of any cases 
among his pupils who are unable to articulate or to comprehend the 
spoken word, should immediately refer them to a clinical expert. 

From the standpoint of the English teacher and his pupils, speech 
situations as wholes, rather than separate techniques, should be 
emphasized. The development of technical skills in speech is prob- 
ably outside of the responsibility. of the teacher of English. Aside 
from the problems of oral language that are peculiarly in the field 
of speech, a large part of the oral English work should be closely 
integrated with written English and with literature. The work in oral 
English given as an integral part of the English course should dis- 
tinctly avoid the artistic and technical aspects of speech. The teach- 
er's aim should be to conduct the work in such ways that pupils will 
be equipped to meet effectively the normal speech demands of every- 
day life. At the beginning of the work, pupils should be led to present 
and discuss their ideas of the influence of such things as posture, 
voice, enunciation, pronunciation, facial expression, and body ad- 
justments upon the effectiveness of a speaker. 


Skills peculiar to written expression 


The catalogue of language outcomes presented on pages 425 to 
428 is a reasonably satisfactory classification for the purpose of 
contrasting two major types of verbal expression, but it seems in- 
adequate when considered from the point of view of the complete 


7 Ibid. р. 232. 
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identification and analysis of the specific underlying skills upon which 
verbal expression depends. For this more exacting purpose а classifi- 
cation based on such units of language form as the word, the sentence, 
the paragraph, and the composition unit, and on certain general 
mechanical factors, is superior. In order to present a more concrete 
idea of the types of abilities called into play at each of these levels 
of language skill, the accompanying detailed outline is given. 


DIAGNOSTIC OUTLINE OF LANGUAGE SKILLS 


A. Words—Skill in the spelling, choice, use, and definition 
т. Spelling—ability to spell certain socially useful words 
а. Contractions 
b. Abbreviations 


2. Choice of words 

Same 

Opposite 

Exact word for meaning 
Variety 

Meaningful words 

Minimum number of words 

g. Semantic variations in meanings 


3. Correct usage 
а. Verbs 
b. Pronouns 
c. Modifiers 
d. Nouns 
4. Use of dictionary 
a. Alphabetizing 
b. Use of guide words 
c. Selection of meaning 
d. Using pronunciation keys 


оао са 


B. Sentences—Skill in the use, form, structure, and organization 


т. Form 
a. Complete, coherent, unified 
b. Variation in beginning 
c. Variation in length 
2. Kind 
a. Declarative 
b. Interrogative 
c. Exclamatory 
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3. Structure 

Simple, compound, complex 

Subject and predicate 

Variety in structure 

Language usage—avoidance of slang and foreign expressions, 
faulty expressions, double negatives 


аз те 


4. Organization 
a. Logical sequence of ideas 
b. Variety for interest 


C. Paragraphs—Skill in the form, structure, and organization 
т. Form 
a. Indentation 
b. Initial and terminal line length 


c. Length 
2. Structure 
а. Unity 


b. Coherence 
3. Organization 
a. Outline 
b. Logical sequence of ideas 


D. Letter writing—Skill in selection of content and in use of form and 
mechanics in 
1. Business letters 
2. Social letters 
3. Informal notes 
4. Formal notes 


E. Outline form 
I. Organization 
2. Capitalization 
3. Punctuation 


F. Bibliographical form 
т. Arrangement for unpublished material 
2. Arrangement for published material 
3. Capitalization 
4. Punctuation 
G. General mechanical factors—Skill in control of 
1. Capitalization 


a. Initial words in sentences 
b. Proper nouns 
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c. Proper adjectives 
d. Titles of honor and respect 
e. Important words in titles of stories, articles, etc. 


2. Punctuation 
а. End 
b. Series 
c. Quoted matter 
d. Special situations 
3. Margins 
a. Top, bottom, sides 
b. Indentation 
4. Handwriting 
a. Legibility 
b. Speed 
5. Abbreviations 
a. Titles 
b. Other situations 
6. Hyphenations 
a. Compound words 
b. Ends of lines 


In spite of the detail of this outline and the number of specific 
skills that contribute. directly to language ability, the reader will 
immediately recognize certain significant weaknesses. Many of the 
skills are identified only in a very general way. The recognition of 
choice of words as a significant language skill is approximately equal 
to stating that addition is an important skill in arithmetic. Much 
more definite information is necessary before all of the details of a 
constructive program of language improvement can be developed. 
Just as it is necessary to identify the socially useful situations and 
facts, or the most useful words in spelling, it is necessary to identify 
the skills that have the greatest social usefulness in language situ- 
ations. Much excellent work has been done on the problem of deter- 
mining a minimal spelling (writing) vocabulary based on social 
utility. Similar work must be done from the standpoint of language 
situations. Until this is accomplished, workers interested in the de- 
velopment of diagnostic exercises in oral and written expression must 
turn to other sources for valid test and drill materials. 
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Measures of general merit of written composition 


The measurement of general merit of written composition, while 
dating well back into the history of educational measurement, has 
not responded to efforts to improve it in proportion to the attention 
it has received. This difficulty comes from the great complexity of the 
skills involved in producing merit in written language, and from the 
vagueness with which these skills have been recognized. Historically, 
the Hillegas Composition Scale antedates most other attempts to 
measure educational products. Not only has this scale accomplished 
much good through the stimulation of interest in the more accurate 
measurement of written composition, but it is still a usable instru- 
ment in its present form, the Thorndike Extension to the Hillegas 
Scale for the Measurement of Quality in English Composition by 
Young People. 

Among the more useful of the currently available scales for the 
measurement of composition quality is the Willing Scale for Measur- 
ing Written Composition. This scale is made up of eight specimens 
of composition all written on the topic, “An Exciting Experience.” 
Through the definite recognition of the relation of form errors to the 
general quality of written work this scale increases its usefulness. 
Its value is also enhanced through the very clear directions for the 
collection of compositions for survey purposes. An excellent list of 
interesting topics is also suggested as the basis for the written work. 
The use of such standardized lists of topics and the control of condi- 
tions under which the writing takes place add distinctly to the re- 
liability with which written composition abilities may be measured. 


| 
Standard tests in grammar and usage 


While the measurement of the common grammatical usages 18 
not confined to the field of written language, the very nature of the 
subject matter itself makes it necessary to measure it in written 
form. For those who believe that there is a formal as well as a func- 
tional aspect of usage, the Kirby Grammar Test still meets the need 
for this type of test. This test measures the ability of the pupil to 
select the correct one of two usages in a sentence situation and to 
recognize the correct grammatical reason for his choice. The content 
of the usage exercises is based on a rather old study of the typical 
errors of children. Numerous comparisons of scores on usage and 


EXPRESSIVE LANGUAGE ARTS 437 


grammatical principles right on this test fail to show a significant 
positive relationship. Somewhat in contrast with this test, the Iowa 
Grammar Information Test? measures purely informational aspects 
of English grammar in eighty specific situations. 


Analytical measurement of language abilities 


In addition to the foregoing tests, each of which presents only 
limited analytical possibilities in the measurement of language, there 
are three or four others that should be mentioned. In the light of the 
criteria for diagnostic measurement that have been set up in this 
volume, most of these tests fall short of being really diagnostic. In 
fact, it is very doubtful if there are any truly diagnostic tests in the 
language field. The Franseen Diagnostic Tests in Language are 
diagnostic only to the extent that they identify difficulties dealing 
with pronouns, verbs, and varied constructions. The Language Sec- 
tion of the Stanford Achievement Tests deals with usage only. The 
Unit Scales of Attainment in Language deal with three aspects of : 
language ability : capitalization, punctuation, and usage. While these 
instruments are primarily designed as elementary-school tests, they 
may be used effectively in Grades 7, 8, and 9. 

The Iowa Language Abilities Test, Intermediate, is a machine- 
scorable, analytical test of 350 items requiring a 46-minute working 
period. This test may be used in Grades 7, 8, and 9 with optional 
use in Grade то. Standard scores, grade equivalents, and percentile 
equivalents are given for each of the seven parts: (1) spelling, (2) 
word meaning, (3) language usage, (4) grammatical form recogni- 
tion, (5) sentence sense, (6) capitalization, and (7) punctuation. 


The Greene-Stapp Language Abilities Test is an attempt to extend 


the same types of analytical measurement into the high-school and 


college-freshman levels. The five test parts covering capitalization, 
spelling, sentence structure and applied grammar, punctuation, and 


usage and applied grammar require two class periods for adminis- 


tration. Tests 3 and 5 represent novel procedures in relating gram- 


matical information to sentence structure and usage situations. The 
test uses separate answer sheets designed for hand or machine scor- 
ing. The technique used in Test 3 is shown in an accompanying 


excerpt. 
Iowa Grammar Information Test. Bureau of 


8 Fred D. Cram and H. A. Greene, , | 
University of Iowa, Iowa City, 1935. 


Educational Research and Service, State 
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Excerpt from Greene-Stapp Language Abilities Test ° 


Drrections. Some of the sentences in this test are correct. Other sen- 
tences are incorrect because they are incomplete statements (f ragments) 
or because they contain double negatives, unnecessary words, misplaced 
modifiers, the wrong verb form, or other errors. Study each sentence 
carefully to determine whether it is a good or a poor sentence. If you 
think that a sentence is correct and that nothing needs to be done to it, 
make a heavy black mark on the separate answer sheet in the space 
under C. (The sentence is correct.) Then go on to the next exercise. 
However, if you think that a sentence is incorrect, decide which one 
of the three numbered statements below the sentence tells what should 
be done to make it correct. Then, on the answer sheet, opposite the 
sentence number, mark the space that has the same number as your 
choice. The answers to the sample sentences below are marked correctly 
on the separate answer sheet. 


Sample A. John and Henry are going to the show this afternoon. 
C. The sentence is correct. 


I. Place a modifying phrase next to the word it modi- 
fies. 


2. Add a subject to make the fragment a sentence. 
3. Add a predicate to make the fragment a sentence. 
For Sample A the right answer is “The sentence is correct," which is 


answer C. So, on the answer sheet, opposite Sample A, the space under 
C has been filled in. 


Sample B. The box was found by the policeman that was stolen. 
C. The sentence is correct. 
4. Add a subject to make the fragment a sentence. 


5. Place a modifying clause as near as possible to the 
word it modifies. 


6. Leave out a word or phrase that repeats an idea. 
For Sample B the right answer is “Place a modifying clause as near as 


possible to the word it modifies,” which is answer 5; so, for Sample B, 
a heavy black mark has been made in the Space under the number 5. 


? Harry A. Greene and Helen I. Stapp, Greene-Stapp Language Abilities Test. 
Published by World Book Co., 1952. 
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3 APTITUDE MEASUREMENT IN ENGLISH 


Purposes of aptitude measurement 


Academic aptitude testing had its origins in 1925 with the develop- 
ment of the Jowa Placement Examinations. The tests of this series 
are intended for use either at the college-freshman or the high-school- 
senior level, although in practice they are probably more useful for 
college students. Aptitude and the rather closely similar prognostic 
tests are now available for several subjects of the secondary-school 
curriculum. The following list of purposes of the Jowa Placement 
Examinations, Aptitude Series, are given here as an indication of the 
desirable results from the use of aptitude tests: 1° 


т. To afford a basis for prediction of the character of work that each 


student will do in college. 

2. To aid in selecting and admitting students. 

3. To serve as an extrance examination in lieu of the more time- 
honored essay-type content examination. 

4. To section classes for instructional purposes on the basis of mental 
ability. 

5. To assist in deciding how much work a student can carry. 

6. To deal more effectively with students who are not well oriented in 

their college work. 

To give scientific aid in vocational guidance and placement. 

To enable comparative studies of intellectual levels as between 


classes, colleges, and college years. 


oom 


Aptitude tests in English 


Aptitude tests require no вре 
they are constructed, but are de 


cial training in the subject for which 
signed to measure those factors that 
contribute to subsequent success in a subject. The manner in which 
test content indicates this characteristic of such tests is shown in 
part by the brief discussion of the /owa Placement Examination, 
English Aptitude, and the Unit Scales of Aptitude that appears on 


the following page. 


10 George D. Stoddard, Iowa Placement Examinations. University of Iowa Studies 
in Education, Vol. III, No. 2. University of Iowa, Iowa City, 1925. p. 9-10. 
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The English Aptitude Test of the Iowa Placement Examination 
Series consists of four parts, which measure (1) ability to compre- 
hend and to apply given rules of grammatical usage to various situ- 
ations, (2) ability to comprehend accurately the thought of a 
compact selection from a college textbook, (3) ability to gain correct 
ideas from a passage of English material, and (4) ability to interpret 
a selection of poetry. The five parts of the Unit Scales of Aptitude 
measure (т) rate of comprehension, (2) perception of relations, (3) 
reading vocabulary, (4) composition vocabulary, and (5) range of 
information. Although this test is not intended as an aptitude meas- 
ure for the field of English alone, as is the English Aptitude Test, 
both are useful tools for the types of administrative and guidance 
functions outlined above. 


4 REMEDIAL INSTRUCTION IN ENGLISH 


Remedial instruction in language will be effective only to the 
extent that pupils are made aware of the social importance of correct 
usage and are led to develop a desire to make use of the best 
forms of expression and to formulate correct habits of usage. 
Language tests of the analytic types should aid in the developing 
of a self-critical attitude on the part of the pupil, which naturally 
leads to the desire to acquire correct habits of expression. 

Specimen types of remedial exercises in language are not presented 
here for two reasons. In the first place, there are countless excellent 
practice and drill books in the English field that provide adequate 
experience in the important skill areas. In the second place, the 
parallel between the desirable types of language drills and the types 
of exercises used in the tests to reveal the presence or absence of the 
skills is very close. 

A helpful organization of diagnostic and remedial suggestions is 
presented in the Manual for Interpreting the Iowa Language Abilities 
Tests. Reproductions of the suggestions for the improvement of 
punctuation and language usage are presented on pages 442 and 443 
as examples of this type of material. Similar suggestions for the im- 
provement of work in spelling, word meaning, sentence sense, and 
capitalization are also given in this manual. 


EXPRESSIVE LANGUAGE ARTS 441 


5 MEASUREMENT AND REMEDIATION IN SPELLING 


Social and educational significance of spelling 


The importance of correct spelling in the written communication 
of ideas is quite generally recognized. Applicants for positions have 
often failed to receive employment because of incorrect spelling of 
words in their letters-of application. Business and: social status is 
frequently determined to a large measure by a person's mastery or 
lack of mastery in this specific skill. Spelling, because of its social 
significance and its tool value in connection with later school progress, 
is so important that educators in general are unwilling to depend on 
the incidental teaching of it for the development of the required 
skill. Spelling is recognized as one of the subject fields in which the 
learning is specific. The child does not just learn spelling, but he 
learns to spell specific words. He may master a definite method of 
learning to spell, but the words he learns to spell are mastered as a 
result of a definite application of effort and attention. 


Measurable qualities in spelling 


Systematic instruction in spelling is usually not a part of the 
high-school program of studies. Apparently it is assumed that the 
pupil has acquired a method of learning to spell as a part of his 
elementary-school training. Presumably also he has been brought 
into contact with fairly extensive vocabularies of socially useful 
words in these earlier school experiences. There is considerable evi- 
dence, however, that in many high schools specific emphasis on 
spelling may not be entirely wasted effort. 

Ability to spell is usually revealed by the pupil's success in writ- 
ing, in either list or context form, words that have been selected from 
some vocabulary list of social or school importance. For more mature 
pupils of high-school age, the ability to recognize the correct or in- 
correct spellings of words is a most desirable skill. This type of 
ability is most commonly measured by tests of the proof-reading 
type, in which the pupil identifies the misspelled word and corrects it. 
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Systematic sampling of words 


The introduction of scientific methods in education in recent years 
has resulted in many investigations into the scope and character of 
spelling lists. Studies such as those by Anderson, Ayres, Fitzgerald, 
O'Shea, Rinsland, Thorndike, and Horn seem to warrant the con- 
clusion that between 4000 and 5000 carefully selected words would 
be an appropriate number for the basic spelling list. Furthermore, 
these studies have proved of great value in selecting the word lists 
to be included in spelling texts, tests, and scales. It is quite ob- 
vious that the words most commonly used in the written language 
activities of adults and children should receive the major emphasis 
in a spelling course of study. To teach pupils to spell words that they 
will very rarely be called upon to spell either in or outside of the 
school is clearly a waste of time. Such words are best left to inci- 
dental learning or to the responsibility of each person as the need for 
their use arises. 


Construction of spelling tests 


In the construction of spelling tests the following four problems 
require careful consideration. 

What words. One of the first problems in the construction of a 
spelling test is that of selecting the words to be included. The values 
of spelling are almost entirely specific, and lie in the ability of the 
pupil to spell words that are actually used and are most certain to 
be used. It is important, therefore, that the content for a test should 
be sampled from those words that are and will be ultimately of 
maximal usefulness to the pupil. 

Among the word lists that have been widely used in the construc- 
tion of the spelling tests is one by Anderson,12 comprising the Тоша 
Spelling Scales, the Thorndike Teacher's Word Book,'? and the Horn 
Basic Writing Vocabulary.’* Anderson’s list was one of the first to 
be based on an extensive word count. Thorndike's list contains ten 


1? W. N. Anderson, Determination of a Spelling Vocabulary Based upon Written 
Correspondence. University of Iowa Studies in Education, Vol. II, No. т. University 
of Iowa, Iowa City, 1917. 

13 E. L. Thorndike, The Teacher's Word Book. Teachers College, Columbia Uni- 
versity, New York, 1921. 

14 Ernest Horn, А Basic Writing Vocabulary. University of Iowa Monographs in 
Education, First Series, No. 4. University of Iowa, Iowa City, 1926. 
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thousand words that were found to occur most frequently in a count 
of several million words taken from many sources. Horn's list in- 
cludes ten thousand words chosen from varied types of adult writing. 
'The words are classified on the basis of frequency, and each word 
frequency is compared with that given in other vocabulary studies. 
This study took into account all previous spelling vocabularies, and 
as a result has greatly influenced the content of recent spelling texts. 
Only 3009 of the ten thousand words in this writing vocabulary were 
designated by the author as being of sufficient social utility to be 
considered basic for elementary-school spelling lists. 

The Jowa Spelling Scales, which have rendered such excellent serv- 
ice for a generation, are unfortunately limited to less than 3000 words 
and reflect spelling abilities not entirely representative of the modern 
school child. A new scale to displace this material is now in process 
of development. A critically selected vocabulary of 5500 words of 
known social utility is now being evaluated on a nation-wide basis. 
While this investigation is concerned primarily with results from 
Grade 2 through Grade 8, many of the more difficult words will have 
definite utility at the high-school level. 

Teachers who are using spelling texts made up of word lists of 
unknown social importance will find such sources of great value in 
selecting valid content for their own tests. Words to comprise a 
spelling test should, of course, be among those comprising the list 
studied by the pupils. The most valid types of spelling words on 
which to test a pupil are also those words that have relatively high 
social usage. Thus a cross-check of the words common to the local 
spelling text and to the Тоша Spelling Scale (or other similarly devel- 
oped scales) will reveal the high social-frequency words the pupils 
have studied and will at the same time give the teacher a measure 
of the relative difficulty of the words from their values in the scale 
itself. Thus the teacher may construct his own valid test on words 
of known difficulty. 

One of the most satisfactory sources of evaluated word lists for 
high-school testing and instructional purposes is the Simmons-Bixler 
Standard High School Spelling Scales. Intended for Grades 7 to 12, 
this list permits the construction of a large number of equivalent 
-tests. The scales, containing words of relatively high social fre- 
quency from the С ommonwealth List, consist of 2560 words for study 
purposes and of 2910 alphabetically-arranged words of given diffi- 
culty by grades for use in constructing spelling tests. 
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How difficult words. It is well known that some words are more 
difficult than others, i.e., some words are more frequently misspelled 
than others. If words are selected at random from any of the lists 
indicated above, some of them will be easy and some relatively 
difficult. Words for a test should be selected in terms of their known 
difficulty. The words in most spelling scales have been so classified 
by having the words spelled by large numbers of children and the 
relative difficulty of each word determined by the percentage of 
correct spellings of each word. The words to be included in the test 
for any grade should be adapted if possible to the ability of the group 
to be tested. Classes of average ability appear to respond best to 
words of approximately so per cent difficulty.!5 On the other hand, 
if the test is to be given over a wide spread of ability, words ranging 
from 14 to 86 per cent standard accuracy with a mean of 50 per cent 
tend to give a distribution more closely approximating the normal 
frequency curve, with the pupils grouped more closely around the 
mean. In general, it is probably safe to say that the words to be 
included in a test for any grade should be those on which there are 
from 40 to 7o per cent misspellings. Tests made up of such words will 
give a reliable measure of spelling ability, since the words will not 
be so easy that there will be many perfect scores or so difficult that 
there will be many low scores. 

How many words. The purpose the test is to serve will determine 
the number of words to use. For survey purposes a list of 25 words 
will probably be sufficient to determine the status of spelling effi- 
ciency for a school system. To be sure, the ability to spell one word 
is separate and distinct from the ability to spell other words. It 
would seem necessary, therefore, to subject a pupil to several hundred 
words in order to secure a reliable measure of his ability to spell the 
most commonly used words. However, the procedure of sampling 
applies to the testing of spelling as in all other testing. While 25 
words is possibly a sufficient number for survey purposes, a larger 
number is needed to reveal the spelling ability of individual pupils. 
On the whole it appears that a minimum of roo words should prob- 
ably be used for individual testing purposes in spelling. Possibly 50 
words are not too many to use for the measurement of general class 
accomplishment. 

15 Walter W. Cook, The Measurement of General Spelling Ability Involving Con- 


trolled Comparisons between Techniques. University of Iowa Studies in Education, 
Vol. VI, No. 6. University of Iowa, Iowa City, 1932. 
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How given. The question of the form in which spelling words 
should be presented for testing purposes has called forth much 
debate in the past. Tt has also been subjected to experimental study 
with results that are not too conclusive when considered in the light 
of practical classroom procedures. Horn summarized the evidence 
on this question as follows : 1% 


Written tests are to be preferred to oral tests... Recall tests are 
superior to and more difficult than recognition tests. The evidence indi- 
cates that the most valid and economical test is the modified sentence 
recall form, in which the person giving the test pronounces each word, 
uses it in an oral sentence, and pronounces it again. The word is then 


written by the students. 


Diagnosis and remediation of spelling disabilities 


Spelling tests and scales afford valuable sources of material that 
may be used to determine both the pupil’s present status in spelling 
and his growth in accomplishment as a result of a period of instruc- 
tion. If scales based on a sound philosophy of subject-matter content 
are used, they provide the most effective materials for the identifica- 
tion of the spelling difficulties of individual pupils. Samplings from 
scales used as tests give the teacher an objective basis for the study 
of these personal difficulties through the accumulation of individual 
lists of words that are sources of trouble. 

To a large extent remedial procedures in spelling may be under- 
taken directly in connection with teaching. The words misspelled 
by pupils in their spelling lessons and tests are obviously the words to 
which they should give special attention. Each pupil should be en- 
couraged to keep an individual list of such words and should be 
stimulated to master them. Occasional spelling periods should be 
put aside for studying and testing these individual lists. If such lists 
are properly utilized, each pupil will come to regard his “demon” 
list as an effective means for eliminating spelling deficiencies. 

Written work in all subjects should be carefully checked for spell- 
ing errors. A list of such misspellings should be kept by every pupil, 
and he should realize that he is to be held responsible for the mastery 
of these troublesome words. The important thing is that the learning 
situation be so manipulated that the pupil will want to learn to spell 


16 Ernest Horn, “Spelling.” Encyclopedia of Educational Research, Revised edi- 
tion. Macmillan Co., New York, 1950. p. 1259. 
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and to feel the need for learning the meaning and spelling of words 
that are pertinent to his written work. 


Individual pupil diagnosis 


The discovery from the results of a spelling test that a pupil is 
below the norm in spelling ability may be of considerable value, but 
it falls far short of its real function unless it reveals to the pupil the 
particular weaknesses that resulted in his low score. The following 
items of information procurable through observation and measure- 
ment are invaluable in diagnosing individual pupil disabilities and 
should be used as much as possible in connection with the analysis 
of the pupils’ spelling habits: (1) intelligence quotients, (2) spelling 
marks, (3) reading marks, (4) writing marks, (5) attendance data, 
(6) visual-defects data, (7) auditory-defects data, (8) speech data, 
(9) general health data, (10) personality characteristics—industry, 
aggressiveness, independence, attentiveness, exactness. 

Tidyman and Butterfield suggested the following procedure in 
diagnosing and treating problem cases in spelling: 17 


т. Give a standard spelling test to discover the amount of deficiency. 

Compare with achievement in other subjects. 

Give an intelligence test to discover general mental capacity. 

Test for defects of hearing and vision. 

Give reading test. 

Give test of spelling consciousness to show whether mistakes are due 

to carelessness or ignorance of the word. 

6. Collect misspellings from spelling tests and written work, and classify 
them according to types of errors. 

7. Get as much information as possible about the pupil's pedagogical 
history, especially methods of beginning reading; knowledge of 
meanings of words; knowledge of phonics; pronunciation and ar- 
ticulation; motor coordination in writing; and emotional attitude 
toward spelling. 

8. From above, assemble probable causes of difficulty in spelling, and 
adopt appropriate remedial measures, such as the following: 


(a) Systematic word study. Early training may have been inade- 
quate. 


ел фо м 


17 Willard Е. Tidyman and Marguerite Butterfeld, Teaching the Language Arts. 
McGraw-Hill Book Co. Inc, New York, 1951. p. 359, quoted from William H. 
Burton, ed., Supervision of Elementary Subjects. D. Appleton.& Co., New York. 
1929. p. 121-22. 
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(b) Exercises in visualization. 

(c) Drill upon particular types of spelling errors. 
(d) Phonics drills. 

(e) Removal of physical defects. 

(f) Develop confidence through successful effort. 


Remedial work in spelling 


Poor spelling is due to faulty or inadequately formed associations. 
Basically, all spellers, good or bad, learn in the same way—through 
association. The main difference between the able and the poor 
speller lies in the study technique used, his personality character- 
istics, and the emphasis he gives to the subject. 

Many investigators of spelling disabilities have abandoned the 
procedure of deducing the causes of spelling difficulties from an 
analysis of errors and are now devoting their time and energies to 
studying the work habits of pupils by means of careful observation 
and tests. A useful summary of many causes of spelling deficiency 
together with suggested remedial treatment is given in the accom- 
panying chart adapted from a current language test manual.'* 


6 MEASUREMENT AND REMEDIATION IN HANDWRITING 


In spite of increased availability, popularity, and use of mechanical 
means for writing, both in school and out, there is little reason to 
believe that handwriting will be displaced as the major means of 
written communication. If it is to serve as an adequate tool in social 
and business communication, handwriting must be easily read, neat 
and pleasing in appearance, and of such form that it can be pro- 
duced under normal conditions with a fair degree of speed. A func- 
tional program in handwriting, according to a discussion of the sub- 
ject, is one which *(r) defines competency in terms of standards 
acceptable in the social and business correspondence of adults; (2) 
encourages individuality of style; (3) emphasizes legibility, appear- 
ance, and ease of writing; (4) eliminates formal drills and limits 
practice to meeting immediate, recognized needs; (5) relates hand- 


18 Manual for Interpreting Iowa Language Abilities Tests. World Book Co., 
Yonkers, N. Y., 1948. p. 24-25. 


THE SECONDARY SCHOOL 


450 


"Aep Iam pue yeads 0} }лодә [eroads 
эчерү `5әзтэләхә 3ur[ads pue ѕ3әј е Suunp шщ тәп 
PULIS "pieoqxpe[q pue MOpUIM 1vau шооз JO juoij 
оз Папа әлоуү ‘aotape [vorpour 10] 1090p 03 зәрә 'q 


"uoneururexo 
s,951nu 10 $,10]20p fuorjgAJesqO `q 


"Survoq 
10 Зшәәѕ ш sanpnogHq `q 


“punos 
3991202 YIM 34815 IJLPOSSE 0} p1OA JO WO; UIM 
a4} je 3urqoo[ әпчл\ 3r ywadar wy әлең ‘апа 10} 
&поипзтр 31 aunouorg `Хлепоцрогр ш prom dn XooT ‘0 


"Kre[nqe20A Suyjads uo paseq 
5}59} uonrDunuoid  [eunojur 
Ssyiqey qooads јо uonvasasqg `2 


“IILI ay} Aq spiom 
jo uonepunuoid Хп] ^» 
'sen[nogpp Surureo[ ogmedg 


узом uyum 
1930 0j әреш st 19]sU*1] JLM 99g 's10112 UMO папа 
UO X10A| “Apnys [PnpIAIDUI 10} stseq se әзәцу osn pue 
X104 Á[rep ut paj[adssrur sp1oA yo syst] doox папа ILH 


“YAOM uum Апер ш 
5ЗшүәЧзїш $, ріпа jo uorjeA198q() 


"sanno 
-gp 3urjeds umo s penprarpur 
uo srseqdura Зшцэвәз jo MVT 


`ЗшцәЧ$ jo ssauya1109 Jo зпәшЯрп/ sty jo Aqure3122 
ѕпапа ypayD *spioA jo 515 ur ssur[[ods 3221102 20150092 
чо ШС "410^ имо Jo Surpeo1pooid [voro әдѕецашу 


"Хом UJM пер UI ѕлоллә 
Suyjeds оў o»uo1ogrput o[quo9HON| 


“Suyjeds poo3 premo} әрп33® 
Pn v Чорәләр 0j ammeg 


CS pue r 
5әзпәләәл 99$) Aym jenos UMOUY јо sst Чум Apnjs 
jo asinoo Suyjads по ur 3Ч8пез 30и spiom ay} YAI 


лом 
пер ur Suyjads лоу рлозәл poo3 
YIM 15213009 ШІ 3593 UO әло MOT 


"E 


"Клер 
-Чезол 3uoi« ло juologip 
чо siseyduia  [euononijsu[ 


"A[njgo1s2 узом uyum umo Zurpeoipooid ‘A1109 
-ur әшоѕ *A[j)01102 pa[[ods әле yoTYyA Jo autos 35] 300 
шолу 5рлол\ рәцәйѕ Арэәллоә Butsooyd *рлол\ oures jo 
ѕлоллә JO 535] шолу sBurjads 3291102 Zursooqo uo suq 


7359) uonejorp uo 
ШӘЛІ әле Spi0A пәцА\ o100s ЦЗ 
YUM рәјѕел3и00 3593 UO 21025 MOT 


"enbrugoo; Surjsej 
әц qj oouosnodxo jo JET 


“г 


INSWIV3H] тутазхчау Gi1S399nS 


AONSIOLIS([ ао AONAGIAY тухошлаау 


Buyjeds :euo үрәшәм “Ze 318v 


59ҹ00$ ISIL 
MOT JO 5я$пүгу 9191504 


451 


EXPRESSIVE LANGUAGE ARTS 


*Aressaoau se sdojs jeadew “prom 
AIM (8) TVI pue sata aso) (4) H IUM (9) 1 
9zipenstA рие sada әзо[гу (S) *22uoquos е ur 31 əs) (+) 
"Se[qe]|4s Aq 31 oounouoiq (£) "31 soounouoid 14923} 
9p[u^ usjsr[ (2) "pio^ уе xoo'T (1) :sdajs pa1se33ng 
"ung; sasn ay [nun [[ods oj Suruireo[ ur sdojs p[rqo 
Woea[ ‘Зицәйѕ ur Suruivo| јо poujeur sprp PANI 


‘Meds 03 Suru1eo[ ш sdojs uo 35293 
*2unpeds ur Ápnjs jo рощәш 
папа јо  uoreAxessqo 85359] 
Sun[ods Апер uo saiods мот 


% ‘4 ә а se шло} 
Jajjo[ Sursnproo роле оў paou ozrsegdurq 'suon 
-?urquio) pu? ѕиопешлоу 194; 3[hOTgID UI 22noviq 


“Spey Uew 10 so1ÁV 
se ‘sajeos Зицилмриеч чил 
PAJ ‘sieded aurjoeds pue yom 
u3jj^ Апер јо чоцелләѕ40) 


pices 
оз uuw јо poyjeur 
т yeu 03. meag y 


чоцешзло}у 19339[ 
‘Bun ш sanmaq `3 


"мо[әд 4 93S [eds 0} Suruieo[ ш sdojs azrseqdurg 
"Á[-enprArpur pauses, әд jsnur pioA Yea jo Зищәй$ 
"punos Aay} sv pajjads jou әле spioa Je Jey} Moys 


^$19j]9[ JO uorssmuo 10 поп 
-Jesut Аерәдѕә “әу Surjəds 
ш әреш siOlI9 JO sad} 30N 


Ё “Aea 
-neuougd spiom orououd 
-un Цәйѕ оў 4Aouopuo[ `f 


"Sjods prey penprarpur euriopu[) *s19339[ JO 
uoneuio; eq; Зицеләд8ехә 'spio« əy} Зигим әз} 
-ovid рапа әлен ‘spiom jo [jeder [enstA oziseqdurq 


"$1593 uonepunu 
-01d :x10A Á[rep jo uorgA1esqo 
*s1odved 3523 3ureds уо stsÁjeuy 


“ләә PWO 10 ‘ppe 
fesodsuei; 0} Хопәрцәү, ‘a 


"jesumg spioA ozÁ[eueg 0j шц LIL, -Zurjoads ш 
шәц} sorpnjs 9q әлојәд Ply (DI^ Spio^ 1340 огу 


зом Апер 
pue sjsa} ur s1013 дицәйѕ jo 
smÁ[eue *мәглләзш [enprArpu 


"spioA јо 3ureds Чум 
se|qe]|Ás pue səyə; jo 
spunos əzenosse оў ampeg "p 


“ponad Ápnjs Surads £1343 10 j1ed se 
‘prom əy} [[e2a1 0j Sundurojje put ‘әкә oy} Zursop 
‘piom əy} je Surjoo[ jo amed ayy oziseudurq 


7338 'saqno 
Пеш Jo Jaquinu *saqna qour-auo 
озш 31 3n2 оў souv[d jo зәдшпи 
‘s; JO Jaquinu jnoqe suon 
-sanb pajiejap sy ‘par pojured 
ооч чцош-әәщ} в әлә spuru 
ш 33s оў 30161} se “593 uon 


-vzipenstA Ái[ 353) uongA1esqQ) `2 


'SULI0] PIOM 2395, lO әл 
-pensia 0} 1oMod papug ^2 


452 THE SECONDARY SCHOOL 


writing to written composition; (6) favors a natural arm-hand- 
finger movement, adapted to age and maturity ; and (7) permits the 
use of handwriting materials commonly used in the home and the 
business world." 1° 

Objectives and measurable qualities їп handwriting. ^ concise 
summary of the objectives of instruction in handwriting is given 
below. While this statement is not especially new or recent it never- 
theless provides a very useful basis for the identification of the 
measurable objectives in this important skill of expression.*? 


OUTCOMES oF HANDWRITING INSTRUCTION 


1. To develop sufficient skill to enable pupils to write easily, legibly, 
and rapidly enough to meet present needs and social requirements. 

2. To equip the child with methods of work so that he will attack his 
writing problems intelligently. 

3. To diagnose individual writing difficulties. 

4. To aid the child to recognize and make use of his peculiar learning 
capacities. 

5. To provide experiences which will tend to develop in the child more 
power to direct his own practice, and more ability to judge whether 
or not he is succeeding in that practice. 

6. To provide the means for each individual to progress at his best 
rate. 

7. To develop an appreciation of the relationship between correct body 
adjustment and an efficient writing production. 

8. To secure acceptable and customary arrangement and form for 
written work (margins, spacing, etc.). 

9. To develop a social urge to use the skill attained in all writing 
situations, 

y 10. То train pupils to be able, at the end of the sixth grade, to write 
quality бо (Ayres Scale) or better, and at the rate of 70 letters per 
minute or better. 


Writing involves a very exact type of visual-muscular coordination, 
which must be developed to a high degree if the product is to possess 
legibility, speed of production, and esthetic quality. Some difficulty 


19 Tidyman and Butterfield, op. cit. p. 362. 

20 “Handwriting.” The Nation at Work on the Public School Curriculum. Fourth 
Yearbook of the Department of Superintendence, National Education Association, 
Washington, D. C., 1926. p. 113-14. 
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has been encountered in the measurement of certain of the elements 
of good writing, particularly from the point of view of analysis and 
diagnosis. The available writing scales, however, have done much to 
establish for the pupil and teacher rather definite standards or ideas: 
of what constitutes an acceptable product as well as to make both 
more and more sensitive to handwriting faults and needs. 

v 'The measurement of handwriting quality in its refined form is 
concerned with two factors:A1) quality, or degree of legibility, and 
(2) speed, or the quantity of writing produced in a given unit of time. 

Quality. The quality of handwriting is usually determined by 
comparing a sample of handwriting with specimens in a standard 
scale. While this method of evaluating handwriting specimens is 
somewhat subjective, experience shows that considerable skill and 
objectivity can be developed through training in the use of such 
scales. At one time the measurement of handwriting simply involved 
the comparison of the script produced with the copybook sample. 
This resulted in overemphasis on the shape and shading of letters 
and in the formation of beautifully engraved lines. Rate and quality 
were not the objectives of writing instruction or of measurement 
under those conditions. 

The essentials of quality in writing are measurable within reason- 
able limits. A number of scales have been developed for use in meas- 
uring quality, but they differ greatly in the type of copy used, in the 
number of elements of quality measured, and in the numerical desig- 
nation of each quality difference. Therefore, it is difficult to compare 
the results secured from the use of one scale with those secured from 
another. 

West and Freeman ?* proposed the quality and rate norms for the 
Ayres Scale as shown in Table 33. These norms for quality are con- 
sidered by many to be sufficiently high, although the values assigned 
for the various grades are based upon the median performance of 
many school children. The norms proposed by West and Freeman 
do not agree exactly with those accompanying the Ayres Scale. Per- 
haps the Ayres data may be considered as norms, while the quality 
and rate values proposed by Freeman may more nearly represent 


standards. v 


E Paul V. West and Frank N. Freeman, “Handwriting.” Encyclopedia of Edu- 
cational Research, Revised edition. Macmillan Co., New York, 1950. p. 524-29. 
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Koos investigated the quality of adult handwriting and also the 
opinions of adults concerning a satisfactory quality of handwriting. 
He reached the following conclusions оп the basis of his findings: ?? 


The fact that some (pupils) will go into pursuits demanding a quality 
better than бо should not be offered as a justification for requiring all 
pupils to attain that better quality... Since all should be required to 
learn to write as well as 6o for purely social use, to train pupils to write 
this quality is the task of general education; to teach some who are going 
into commercial or other vocations requiring a higher quality ...to write 
this better quality is the task not of general but of vocational educa- 
tion. .. In the light of these facts it is difficult to see why ...a pupil 
should be required to spend the time necessary to learn to write better 
than the quality of бо. There is even considerable justification for setting 
the ultimate standard at 50. 


У Rate. The rate at which pupils write is of considerable importance. 


The person who is able to write more rapidly than others and with 
approximately the same quality has an obvious advantage in the 


TABLE 33. Handwriting quality and speed standards 


Grades ЗЗА 6. |7 | 8 


Quality on Ayres Scale. ....... 44 | 47 | 50 | 55 | 59 | 64 | 70 


Rate in Letters per Minute....| 36 | 48 | 56 | 65 | 72 | 80 | 90 


field of written expression, provided, of course, that ideas come to 
him as rapidly as he is able to transcribe them. The measurement of 
rate in writing is much less difficult than the measurement of quality. 
Rate of writing can be measured most conveniently by asking pupils 
to write, within carefully controlled time limits, selections from 
standardized copy. If the pupils all write from the same selection 
and if they have all thoroughly memorized it, the number of letters 
each pupil writes in the time allotted can easily be computed as the 
pupil's rate score.Table 33 indicates that pupils at the end of the 
second grade should write at the rate of 36 letters per minute and 
increase their speed to 48 letters per minute by the end of the third 


?? L, V. Koos, “The Determination of Ultimate Standards of Quality in Hand- 
writing for the Public Schools." Elementary School Journal, 18:423-46; February 
1918. 
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grade and to a rate of 9o letters per minute by the end of the eighth 
grade. 


Measurement of handwriting ability 


Y Measurement of the quality of handwriting and the rate at which 
it is produced is accomplished by the evaluation of handwriting 
specimens secured under standard conditions. Accordingly, the first 
step in the process of measuring handwriting is that of securing these 
specimens under controlled conditions. 

Securing handwriting specimens. Three factors appear to affect 
the conditions under which handwriting specimens for scaling are 
secured. The character of the copy the pupils are called upon to 
write may significantly influence their reactions. Whatever sample 
is used should be simple and easily understood so that the pupils 
will not be unduly affected by spelling and vocabulary difficulties. 
The instructions given to the pupils may also influence the quality 
and rate of their writing. Therefore, care should be exercised to use 
very precise directions such as those suggested with the handwriting 
scale in use. If the specimens are to be used as a survey test, standard 
copy such as the “Gettysburg Address” should be used. The time» 
allowance for writing the specimens is a third factor that must be 
considered in the collection of writing specimens. In the standardiza- 
tion of this scale, Ayres used the first four sentences of Lincoln’s 
“Gettysburg Address” and allowed each pupil two minutes in which 
to copy as much of this material as possible. Since then it has be- 
come a rather typical practice to allow a two-minute period for the 
writing of such samples. 

Securing rate scores. Rate of handwriting is usually expressed in 
terms of the number of letters written per minute. This is determined 
by counting the total number of letters written by each pupil and 
dividing this by the number of minutes the pupils were allowed to 
write. 

Securing quality scores. 'The quality of the handwriting specimen 
being measured is determined by moving it along the scale until a 
specimen is found that closely matches it in quality. The quality 
value of the scale sample is then assigned as the quality of the 
sample of the pupil's handwriting. As the scorer gains experience 
intermediate values may be estimated. 

Accuracy in measurement of handwriting. Skill in the evaluation 
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of handwriting specimens requires a thorough understanding of the 
scale to be used and considerable training in its use. It is desirable, 
therefore, for the teacher, prior to any attempt to use the scale for 
the measurement of handwriting quality, to study carefully the scale 
itself, the directions for its use, the norms, and the specific functions 
which the particular scale is expected to perform. The accurate and 
reasonably objective rating of handwriting samples on a scale re- 
quires considerable skill, which experience shows can be developed 
through practice. For this purpose standard sets of writing samples 
of known quality are very useful. 


Merit scales 


Handwriting scales may be divided into two groups: (т) general 
merit scales and (2) analytical and diagnostic charts and scales. The 
choice of a scale depends on the purpose it is to serve. 

The Thorndike Scale was the first writing scale to be devised. This 
scale is designed for Grades 5 to 8 inclusive and consists of a series 
of specimens of handwriting so arranged that they increase in order 
of merit from a quality of 4 units above zero to one of 18. Its pur- 
pose is to aid teachers in grading handwriting for “general merit” 
on the basis of three characteristics: beauty, legibility, and char- 
acter. 

The Ayres Handwriting Scale,” the next scale to be devised, was 
standardized on the basis of legibility. Legibility was determined by 
the speed and ease with which the samples of handwriting were read 
by a number of trained and competent judges. The Gettysburg Edi- 
tion, now in general use, contains only one style of handwriting— 
the accepted moderate-slant style. 

The American Handwriting Scale developed by West is one of the 
most recent and comprehensive of the general merit scales. Among 
a number of distinctive features are at least two that deserve special 
mention: (т) A separate scale is provided for each grade from 2 to 8; 
(2) The samples have been scaled for both quality and rate, the 
poorer samples being written at a slower rate and the better samples 
being the ones written at a more rapid rate. The existence of the 
separate scales for Grades 2 to 8 inclusive permits a somewhat more 

23 Leonard Р. Ayres, A Scale for Measuring the Quality of Handwriting of School 


Children. Bulletin No. 113. Division of Education, Russell Sage Foundation, New 
York, 1072- 
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exact evaluation of quality of writing in its relation to grade location. 

Diagnosis and remediation of handwriting. Instruments for the 
identification of specific faults in handwriting are of two general 
types: (r) analytical scales and (2) score cards. 

The Freeman Chart for Diagnosing Faults in Handwriting is vir- 
tually five scales in one. Each scale is designed to reveal whether the 
pupil's writing specimen violates one or more of the five essential 
characteristics of good handwriting. These traits are: (1) uniformity 
of slant, (2) uniformity of alignment, (3) quality of line, (4) letter 
formation, and (5) spacing. Each scale shows three levels of quality 
for the trait with which it deals—excellent, mediocre, and poor. This 
scale is valuable because it enables both teacher and pupil to dis- 
cover the handwriting weaknesses that are in need of special treat- 
ment. ^ 


Diagnosis by analysis 


Improvement in handwriting instruction depends to a large degree 
on the teacher's knowledge of the elements that make for quality in 
the product, and the use of instruments that are adequate to reveal 
significant differences in quality. Inferior products of handwriting 
instruction may be due to lack of skill or mastery in manv different 
phases of the writing act. Freeman's Chart for Diagnosing Faults in 
Handwriting meets this need for securing separate measures of the 
several aspects of handwriting performance. This scale may be used 
to measure the whole class, but it is most effective when used to 
diagnose the writing of pupils who rank conspicuously below the 
grade norm as revealed through use of some general merit scale. 

The following list of handwriting defects and their causes should 
be useful to the classroom teacher. 


ANALYSIS ОЕ DEFECTS IN HANDWRITING AND THEIR CAUSES 


Defect Causes 
т. Тоо Much Јаве... (1) Writing arm too near body 
(2) Thumb too stiff 
(3) Point of nib too far from fingers 
(4) Paper in wrong position 
(5) Stroke in wrong direction 


2. Writing too straight ...(1) Arm too far from body 
(2) Fingers too near nib 
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Defect Causes 


(3) Index finger alone guiding pen 
(4) Incorrect position of paper 


3. Writing too heavy ..... (1) Index finger pressing too heavily 
(2) Using wrong pen 
(3) Penholder too small diameter 
4. Writing too light ...... (т) Pen held too obliquely or too straight 
; (2) Eyelet of pen turned side 
(3) Penholder too large diameter 
5. Writing too angular ....(1) Thumb too stiff 
(2) Penholder too lightly held 
(3) Movement too slow 
6. Writing too irregular ...(1) Lack of freedom of movement 
(2) Movement of hand too slow 
(3) Pen gripping 
(4) Incorrect or uncomfortable position 


—— 


7. Spacing too wide ...... (т) Pen progresses too fast to right 
(2) Too much lateral movement. 


Physical conditions and materials 


Prominent among the physical factors affecting the pupil’s hand- 
writing i is his desk. The pupil’s desk should be adjusted to his height 
so that when he is seated normally his thigh is at right angles to the 
lower part of his leg and his feet are flat on the floor. In accordance 
with most modern methods of writing, the pupil’s body, when he is 
writing, should face the middle of the desk squarely and bend slightly 
forward at the hips. For right-handed writers, both forearms should 
be well up on the desk, the left holding the paper, the right wrist 
raised and inclined slightly to the right. It is necessary that the pupil 
be taught to move the paper upward and to the left as the writing 
progresses. The shifting is done with the left hand, while the right 
arm is held in the correct position. There is some difference of opinion 
about the best position of the writing arm. It is generally agreed, 
however, that the writing hand should be supported on the third and 
fourth fingers and that the wrist should not be tilted more than 45 
degrees. The forearm of the right hand should be perpendicular to 
the line of writing. The pen should be grasped lightly and in such a 
way that the forefinger is below the thumb and at least one inch 
above the point of the pen. 


-——— 


Ч 


EXPRESSIVE LANGUAGE ARTS 459 


Experiment and observation show that modern writing is a com- 
bination of whole arm, forearm, wrist, and finger movements. It is 
not possible or even desirable to eliminate finger movement entirely, 
even in so-called *muscular movement writing." 


Improvement of psychological conditions 


Next in importance in preparing the way for effective mastery of 
writing faults is the provision of desirable psychological conditions. 
The establishment of a desire for improvement on the part of the 
pupil is essential. One plan that has been proved to be quite effective 
involves the pupils’ use of handwriting scales for the appraisal of 
their own writing. A copy of some good general merit scale should 
be conveniently placed in every classroom to encourage and train 
pupils in its use as a means of facilitating comparisons and evalu- 
ation of personal products. 

Another means of motivation is the exemption from further pen- 
manship drill of all pupils who have attained an acceptable standard 
of speed and quality. The standard of бо for speed and quality is the 
one generally accepted. Evidence seems to indicate that from 5o to 
75 per cent of the pupils in the upper grades can easily reach this 
standard. If these pupils are exempt from further drill, the teacher 
is able to devote more time to those who have failed to meet the 
standard. 

For improving the rate of slow writers, the writing of some simple 
sentence such as “The quick brown fox jumps over the lazy dog" is. 
recommended. Instruction in the making of different letters may be 
required by some children. Pupils are greatly helped by special prac- 
tice on the letters and strokes that have given them trouble as a 
means of enabling them to attain accepted standards: of speed and 


quality. 


Handedness as a factor in writing 


In addition to the physical and psychological conditions discussed 
in the preceding paragraphs, there is the very important factor of 
handedness in the pupil. The general considerations of method and 
remedial procedures in handwriting appear to assume right-handed- 
ness in the child. Yet left-handedness is common enough in the class- 
room to represent a significant problem to the teacher, and one 
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worthy of some consideration here. Naturally enough the pursuit of 
methods of instruction and remedy suitable for the right-handed 
pupil results in the formation of atrocious writing habits for the 
left-handed. pupil. Any attempts to force him to conform to common 
right-handed practices usually forces him to write backwards, i.e., 
toward the left. In order to correct for the resultant reversal of the 
image, the pupil frequently twists his left wrist around in such a 
way that the pencil or pen point is directed toward him, with the 
result that he works awkwardly and under a most severe muscular 
maladjustment. For these and for other reasons that appear to be 
related to the speech and language functions, the teacher should 
probably not attempt to change over the left-handed pupil. It is 
almost certainly better to accept the tendency to left-handedness 
which is well developed by the time the child enters the first grade 
and to aid him in making the best possible adjustments and adapta- 
tions in his mastery of handwriting than it is to run the risk of con- 
fusing him and possibly causing serious emotional disturbances at 
a later time. There is little or no evidence that the child at birth has 
any predispositions to use one hand rather than the other. Since he 
lives in a predominantly right-handed world in which it is easier to 
conform than not to conform, parents may profitably give some 
serious attention to the problem during the child's early formative 
years. The proper time to affect the child's handedness without 
danger of harmful reactions would seem to be in the period from his 
first active moment until he reaches school age. 


Topics for Discussion 


т. What are the major situations in life in which language is used? 

2. Evaluate the relative demands made by life situations on the oral 
and written aspects of language. 

3. From the standpoint of classroom emphasis, which should receive 
more attention, oral or written language? 

4. According to Travis, what are the major causes of oral language 

disabilities? 

Discuss the problems of measurement of oral language abilities. 

How is ability in written composition measured? 

Discuss the status of analytical testing of written language abilities. 

Discuss some remedial drill materials of value in language instruc- 

tion. 


oo о л 
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9. What appears to be the most acceptable fundamental assumption 
upon which the spelling vocabulary suitable for elementary-school 
instruction should be based? 

то. Show how a spelling test can be made from a standard spelling scale. 

тт. How can a spelling test made up of socially useful words be vali- 
dated for use in a classroom in which a textbook in spelling based 
on a vocabulary of unknown social significance is in use? 

12. What range of difficulty in words would you select for the purpose 
of measuring a class of unusually poor spelling ability? 

13. Discuss the pupil work habits that have diagnostic significance in 
the field of spelling. 

14. Which of the objectives or outcomes of instruction in handwriting 
are most defensible from a social utility point of view? 

15. What is the relationship between handwriting speed and quality? 

16. What place have standards in the evaluation of handwriting? 

17. Describe some of the methods of diagnosing handwriting ability. 

18. Discuss the implications of parental responsibility for left-handed- 
ness and the relation of handedness to handwriting, language, and 


speech. 


Selected References 


Азнвлпон, E. J. Iowa Spelling Scales. Iowa City: Bureau of Educa- 
tional Research and Service, State University of Iowa, 1945. 

Вкттз, Emmerr A. “Guidance in the Critical Interpretation of Lan- 
guage.” Elementary English, 27:9-18 ff.; January 1950. 

Boynton, Marcta. “Inclusion of ‘None of These’ Makes Spelling Items 
More Difficult.” Educational and Psychological Measurement, 10:431- 
32; Autumn 1950. 

Buros, Oscar K., editor. The Fourth Mental Measurements Yearbook. 
Highland Park, N. J.: Gryphon Press, 1953. р. 294-317, 323-32, 

I-43. 

воо ПЕ K., editor. The Nineteen Forty Mental Measurements 
Vearbook. Highland Park, N. J.: Mental Measurements Yearbook, 
1941. p.100-40. ў 

Bunos, Oscar K., editor. The Nineteen Thirty Eight Mental Measure- 
ments Yearbook. New Brunswick, N. J.: Rutgers University Press, 

. P- 72-79. 
abs DEAE, editor. The Third Mental Measurements Yearbook. 
New Brunswick, N. J.: Rutgers University Press, 1949. p. 218-47. 
Cattewaert, Н. “A Rational Technique of Handwriting " Journal of 


Educational Research, 41:1-12; September 1947. 


462 THE SECONDARY SCHOOL 


Соок, Warren W. “Evaluation in the Language-Arts Program." Teach- 
ing Language in the Elementary School. Forty-Third Yearbook of the 
National Society for the Study of Education, Part II. Chicago: De- 
partment of Education, University of Chicago, 1944. Chapter 9. 

DrepERICH, PauL B. “The Measurement of Skill in Writing." School 
Review, 54:584-92; December 1946. 

DIEDERICH, PauL B. “Teaching English with Test Exercises." School 
Review, 55:80-86; February 1947. 

Foran, Tuomas G. The Psychology and Teaching of Spelling. Washing- 
ton, D. C.: Catholic Education Press, 1934. 

FREEMAN, Frank N. “Contributions of Research to Special Methods: 
Handwriting.” The Scientific Movement in Education. Thirty-Seventh 
Yearbook of the National Society for the Study of Education, Part II. 
Bloomington, Ill.: Public School Publishing Co., 1938. Chapter 6. 

FREEMAN, FRANK N., AND DoucHerty, Mary L. How To Teach Hand- 
writing. Boston: Houghton Mifflin Co., 1923. 

GREENE, Harry A. A Criterion for the Course of Study in the Mechanics 
of Written Composition. University of Iowa Studies in Education, 
Vol. VIII, No. 4. Iowa City: University of Iowa, November 4, 1933. 

GREENE, Harry A. “Contributions of Research to Special Methods: 
English Usage.” The Scientific Movement in Education. Thirty- 
Seventh Yearbook of the National Society for the Study of Education, 
Part II. Bloomington, Ill: Public School Publishing Co., 1938. 
Chapter 9. 

GREENE, Harry A. “English—Language, Grammar, and Composition.” 
Encyclopedia of Educational Research. Revised edition. New York: 
Macmillan Co., 1950. p. 383-96. 

GREENE, Harry A., AND Gray, WirLIAM S. “The Measurement of 
Understanding in the Language Arts.” The Measurement of Under- 
standing. Forty-Fifth Yearbook of the National Society for the Study 
of Education, Part I. Chicago: University of Chicago Press, 1946. 
р. 176-89. 

Harris, CHESTER W. “Prediction of the Difficulty Index of Objective- 
Type Spelling Items.” Educational and Psychological Measurement, 
7:319-25; Summer 1947. 

HILDRETH, GERTRUDE Н. “Evaluation of Spelling Word Lists and 
Vocabulary Studies." Elementary School Journal, 51:254-65; January 
1951. 

Horn, Ernest. A Basic Writing Vocabulary: 10,000 Words Most Com- 
monly Used in Writing. University of Iowa Monographs in Education, 
First Series, No. 4. Iowa City: State University of Iowa, 1926. 

Horn, Ernest. “Contributions of Research to Special Methods: Spell- 
ing.” The Scientific Movement in Education. Thirty-Seventh Yearbook 


EXPRESSIVE LANGUAGE ARTS 463 


of the National Society for the Study of Education, Part II. Bloom- 
ington, Ill.: Public School Publishing Co., 1938. Chapter 8. 

Horn, Ernest. “Spelling.” Encyclopedia of Educational Research. Re- 
vised edition. New York: Macmillan Co., 1950. p. 1247-64. 

Iowa Elementary Teachers Handbook, Vol. 3. Spelling and Handwriting. 
Des Moines: Iowa State Department of Public Instruction, 1943. 
Iowa Elementary Teachers Handbook, Vol. 4. Oral and Written Lan- 
guage. Des Moines: Iowa State Department of Public Instruction, 

1944. 

Jounson, Leste W. “One Hundred Words Most Often Misspelled by 
Children in the Elementary Grades.” Journal of Educational Research, 
44:154-55; October 1950. 

Јонмѕом, WENDELL, AND OTHERS. Speech Handicapped School Chil- 
dren. New York: Harper and Brothers, 1948. 

Jorpan, A. M. Measurement in Education, New York: McGraw-Hill 
Book Co., Inc., 1953. p. 117-41, 144-52, 156-67. 

Lester, JOHN A., AND Ілмрооѕт, E. F. *Examinations in English." 
The Construction and Use of Achievement Examinations. Boston: 
Houghton Mifflin Co., 1936. p. 410-37. 

MaRCKWARDT, ALBERT H., AND Warcorr, Frep G. Facts about Current 
English Usage. National Council of Teachers of English, English 
Monograph No. 7. New York: D. Appleton-Century Co., Inc., 
1938. 

McKzz, PauL. “An Adequate Program in the Language Arts." Teaching 
Language in the Elementary School. Forty-Third Yearbook of the 
National Society for the Study of Education, Part II. Chicago: De- 
partment of Education, University of Chicago, 1944. Chapter 2. 

Mutcrave, Юовотну I. Speech for the Classroom Teacher. Revised 
edition. New York: Prentice-Hall, Inc., 1946. 

ЅрАСНЕ, GEORGE. “Spelling Disability Correlates I—Factors Probably 
Causal in Spelling Disability.” Journal of Educational Research, 
34:561-86; April 1941. 

SrrIcKLAND, Котн G. The Language Arts in the Elementary School. 
Boston: D. C. Heath and Co., 1951. 

THORNDIKE, Epwarp 1. A Teacher's Word Book. Revised edition. New 
York: Teachers College, Columbia University, 1931. 

TipyMaN, WILLARD F., AND BUTTERFIELD, MARGUERITE. Teaching the 
Language Arts. New York: McGraw-Hill Book Co., Inc., 1951. 

Travis, Ler E. “Diagnosis in Speech." Educational Diagnosis. Thirty- 
Fourth Yearbook of the National Society for the Study of Education. 
Bloomington, Ill.: Public School Publishing Co., 1935. Chapter 19. 

TRAXLER, ARTHUR E. The Use of Test Results in Diagnosis and Instruc- 
tion in the Tool Subjects. Revised. Educational Records Bulletin 


464 THE SECONDARY SCHOOL 


No. 18. New York: Educational Records Bureau, January 1937. 
р. 31-43, 62-73. 

WEITZMAN, ELLIS, AND McNamara, WALTER J. Constructing Classroom 
Examinations. Chicago: Science Research Associates, 1949. p. 54-63. 

West, PAUL V., AND FREEMAN, FRANK N. “Handwriting.” Encyclo- 
pedia of Educational Research. Revised edition. New York: Mac- 
millan Co., 1950. p. 524-29. 

Woop, BEN D., AND НАЕЕМЕВ, RALPH. Measuring and Guiding Indi-: 
vidual Growth. New York: Silver Burdett Co., 1948. р. 291-98. 


17 


Measuring and Evaluating in the 
Foreign Languages 


THE PURPOSE of this chapter is to present a summary of the following 
points involved in measurement and evaluation in the foreign 


languages: 


A. Objectives of foreign language instruction. 

B. Identification of measurable outcomes in the foreign lan- 
guages. 

. Trends in the modern foreign languages. 

Measuring achievement in the modern foreign languages. 

Measuring achievement in Latin. 

Aptitude and prognostic testing in the foreign languages. 

Diagnosis and remediation in the foreign languages. 


Ges р ANO 


The third language arts area at the high-school level is the 
foreign languages. Both the modern and the classical foreign lan- 
guages involve receptive as well as expressive language arts. They 
also naturally involve a second language for the pupil. Hence, the 
measurement problems pointed out in Chapters rs and 16 for the 
English language carry over to the foreign languages and are further 
complicated by the fact that the beginner is acquiring certain aspects 
of a second language in terms of-his native tongue. This holds true 
despite the fact that the aims and objectives of foreign language 
study do not necessarily parallel those appropriate for the pupil's 


native language. 
465 
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] OBJECTIVES AND OUTCOMES OF THE FOREIGN LANGUAGES 


Some problems encountered in defining aims and objectives for the 
English language arts and the foreign languages are similar, but there 
is no tendency to uphold the same objectives or the same levels of 
skill and proficiency for the pupil in a foreign language as may be 
widely accepted for his native tongue. The two sets of objectives are, 
on the contrary, distinctly different. The modern and the classical 
foreign languages also differ so widely in their present-day values 
and uses that separate sets of general objectives again are necessary. 


Outcomes of the modern foreign languages 


Although the literature on the modern foreign languages is replete 
with articles treating instructional objectives and outcomes, most of 
the articles deal only with aspects of the broad issue or treat ob- 
jectives and outcomes for single languages rather than for the modern 
languages in general. The authors believe that the following list sum- 
marizing the discussion of instructional outcomes by Kaulfers' is 
representative: 


1. Direct outcomes 


a. Reading 
b. Writing 
c. Speaking 


d. Translating 


2. Concomitant outcomes, such as 

Mental discipline 

Habits of neatness in written work 

Habits of attention to...language mechanics... 
Critical attitude toward correct usage in language... 
Insights into the mechanical structure of language 
Habits of consulting the dictionary 

Better speech habits, pronunciation, and diction 


өч EG ODER ОЛЫ 


3. Associate outcomes, contributive to 
a. Worthy use of leisure 
b. Ability to understand, adjust to, and cooperatively improve the 
social environment 


1 Walter V. Kaulfers, Modern Languages for Modern Schools. McGraw-Hill Book 
Co., Inc., 1942. p. 349-56. By permission. 
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c. Ability to understand, appreciate, adjust to, and improve the 
physical environment 

d. Building of desirable physical and mental health 

e. Increased vocational and prevocational efficiency 


Kaulfers distinguished the direct outcomes as subject to control 
by the teacher in his selection of methods, classroom activities, and 
course content.? He classed under concomitant outcomes those values 
resulting from experiences of the pupil in the process of acquiring 
the direct outcomes ? and under associate outcomes the values ac- 
cruing primarily from the content used in the attainment of the direct 
outcomes.* 

Agard and Dunkel* classified objectives as basic skills and as 
those resulting from the acquisition of basic skills: 


i. Basic linguistic skills objectives 
a. Reading 
b. Writing 
c. Speaking 
d. Aural comprehension 
2. Objectives for which the language skills serve as means, such as 
a. Disciplinary training in neatness, accuracy, and logical thought 
b. Increased understanding of ...language as a means of com- 
munication and as a tool of thought 
c. Better command of one’s native language 
d. Knowledge of the foreign people’s history, culture, and civili- 


zation.... 
e. Increased international understanding and good will 
f. Development of historical and cultural perspective 


Trends in the modern foreign languages 


The need for more foreign language instruction and for new in- 
structional emphases in the foreign languages has become increas- 
ingly clear during the last fifteen years. World War II brought 
tremendous demands for military and civilian personnel trained in 
the use of a large number of the modern foreign languages. Increas- 
ingly rapid facilities for transportation and communication have 


2 Ibid. р. 355. 
3 Ibid. p. 349. 
4 Ibid. p. 350. 
oF EB: Agard and Harold B. Dunkel, Ап Investigation of Second-Language 


Teaching. Ginn and Co., Boston, 1948. p. 15-16. 
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brought the people of the world closer together and have created 
demands for wider language training. The Good Neighbor Policy 
and the United Nations have further emphasized the need for im- 
proved understandings among the peoples of the world. These de- 
velopments make mandatory a functional approach to the teaching 
of modern foreign languages. 

The impact of these three developments on methods of teaching 
the modern foreign languages has been great, although the ultimate 
effect has not yet become clear. These influences are traced briefly 
below in order to establish acceptable emphases for the various in- 
structional objectives and outcomes discussed above. 

Personnel able to speak and to comprehend a wide variety of 
foreign languages was urgently needed during World War II. The 
need was met by concentrated programs of instruction emphasizing 
the speaking and listening objectives and neglecting or at least mini- 
mizing the time-honored reading and writing objectives. The com- 
plexity of the problem is made apparent when it is recognized that 
there are some 2800 different languages spoken in the modern world, 
although probably under one hundred of these are of major impor- 
tance culturally and economically and only about twenty are so 
widely used that their importance is paramount.* 

А modern world in which developments in transportation and 
communication bring its peoples closer and closer together and in 
which the United Nations and other international agencies enhance 
the need for mutual understanding requires effective modes of com- 
munication among the citizens of many countries and provinces. 
McGrath * emphasized the need for an increased instructional em- 
phasis on the spoken language and suggested the offering of a second 
language in many schools on at least an optional basis in the inter- 
mediate grades. He also recommended more widespread teaching 
of foreign languages in the high school and college. 

The third influence, best represented by the Good Neighbor Policy, 
emphasizes the need not only for an increasing stress on functional 
values but also for attention to a newly recognized instructional 
area— English as a foreign language. 

Agard and Dunkel,’ with whose point of view many writers in the 

6 Jack Cohn, “The Implications of the Current World Situation for Foreign 
Language Instruction." Modern Language Journal, 36:402-4; December 1952. 

* Earl J. McGrath, *Language Study and World Affairs.” Modern Language 


Journal, 36:205-9; May 1952. 
% Agard and Dunkel, of. cit. p. 17-38. 
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area of modern foreign languages have expressed agreement, sup- 
ported the belief that the speaking and understanding, or oral-aural, 
objectives are of primary significance today and that they are 
superseding the reading objective in that position. These authors,’ 
as well as Tharp and his colleagues 1° presented evidence to show 
that most of the non-skills objectives and in fact the reading ob- 
jective itself have been very imperfectly realized in typical school 
situations. 

English as a foreign language. 'The concept of English as a foreign 
language has only recently received much attention in America. The 
Good Neighbor Policy, entailing the teaching of English widely in 
the western hemisphere, and the increasing number of displaced 
persons coming to America from various European countries with 
facility only in their native tongues have given rise to the concept 
in this country. Fries 1! wrote what is probably the only American 
textbook for use in teaching English to such persons. Lado? con- 
structed what is probably the only published test of English as a 
foreign language. 

The Cooperative Inter-American Tests deserve mention here, al- 
though they are not designed for use in foreign language courses, 
are not restricted in their coverage to the high-school level, and are 
not, in fact, used solely in the measurement of achievement. The 
series of tests printed in parallel English and Spanish editions was 
prepared under the auspices of the Committee on Modern Languages 
of the American Council on Education. Parallelism of forms in the 
two languages makes possible the use of these tests in bilingual 
situations and for the classification of pupils of foreign parentage or 
backgrounds. They are also useful in a single language when com- 
parisons across linguistic borders are not desired. 

Three types of tests are available in two forms each for English 
and Spanish. The tests of general ability, designed to measure general 
mental development rather than achievement, and of reading, stress- 
ing the understanding of words and the reading situation as a whole, 


9 Ibid. p. 16-17. 

10 James B. Tharp, Algernon Coleman, and Clara B. King, “Foreign Languages— 
Modern.” Encyclopedia of Educational Research, Revised edition. Macmillan Co., 
New Vork, 1950. p. 464-85. у 

11 C; С, Fries, Teaching and Learning English as a Foreign Language. University 
of Michigan Press, Ann Arbor, 1945. ; 

12 Robert Lado, English Language Test for Foreign Students. George Wahr Pub- 


lishing Co., Ann Arbor, Mich., 1951. 
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are available at three levels: Primary for Grades 1 to 3, Intermediate 
for Grades 4 to 7, and Advanced for Grades 8 to т 3. Three more 


specialized tests are issued only at the advanced level. These consist 
of a language usage test emphasizing ability to use English or Spanish | 
and one test each in the natural sciences and the social studies stress- _ 


ing the ability to comprehend materials in these content areas. 
The work of Manuel and his collaborators in Mexico, Puerto Rico, 


and elsewhere in the development of this battery of tests represents - 


one significant movement in the teaching and testing of English as а 
foreign language. Other American-sponsored programs serving à 


similar purpose by somewhat different methods are operating in such | 


countries as Brazil, Costa Rica, El Salvador, and Haiti. 


Foreign language in the elementary school. The teaching of a. 


second language in the elementary school, a common practice in many _ 


foreign countries, has been introduced in a small number of American 
communities during the last few years. This is particularly true of 
certain southern states where there are many school children of 


Mexican parentage, but the movement is not entirely restricted to . 
such communities. The authors are not aware of any tests of pro- | 


ficiency in a foreign language for use at this grade level. 


Objectives of Latin 


The objectives of instruction in Latin naturally differ considerably 
from those in the modern foreign languages. The Classical Investiga- 


tion stated the immediate objective as the progressive development — 


of the power to read and understand Latin through an increasing 


mastery of vocabulary, forms, and syntax. Ultimate objectives stated 


in this report included: 1° 


t. Development of an increased understanding of those elements in — 


English which are related to Latin 

FACE RC of an increased ability to read, speak, and write Eng- 
ish 

Development of an increased ability to learn other foreign languages 
Development of correct mental habits 

Development of an historical and cultural background 

Development of right attitudes toward Social situations 
Development of literary appreciation 


N 


м л р 


13 American Classical League, The Classical Investigation, Part I, General Report. 
Princeton University Press, Princeton, N. J.: 1924. 
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8. Development of an elementary knowledge of the simpler general prin- 
ciples of language structure 
9. Development of improved literary quality of written English 


More recently, the objectives of high-school Latin teaching were 
restated for a two-year course sequence by a Committee on Educa- 
tional Policies of the Classical Association of the Middle West and 
South. The report, dealt with in a symposium later the same year,** 
was restricted to a two-year program because 80 to 9o per cent of 
high-school pupils who study the subject do not continue beyond 
the second year.!5 Е1ѕе 1° stated the objectives for the two-year 
course sequence as follows: 


I. The developing high-school student should gain added proficiency 
in language through 
A. Increased awareness of the structure of language as a skeleton 
of speech and thought 
B. An improved ability to understand and use English words of 
Latin derivation 
C. A knowledge of actual Latin words and phrases commonly 
used in English 


II. The developing student ...needs to become more keenly aware of 
the roots of our culture. He should be conscious of the role played 
by classical culture in shaping not only our American tradition but 
the Western tradition as а whole. Such an understanding is not 
merely desirable but vitally necessary if Americans are to become 
citizens of the world. 


It is clear that the general pattern of the Classical Investigation 
is followed in the more modern list of objectives. In fact, the differ- 
ences between the earlier and later formulations lie primarily in the 
selection of materials and instructional emphases. The 1947 program 
recommended *a functional approach to the teaching of grammar, 
new reading material for the first year, the reading of Virgil's Aeneid 
during the second year, and the selection of a vocabulary which will 
take into consideration its usefulness as a source for building English 
words of Latin derivation.” ?* Dunkel indicated that this two-year 


14 Symposium, “Toward Improvement of the High-School Latin Curriculum.” 
Classical Journal, 43:67-90; November 1947. 

15 Fred S. Dunham, “Introduction.” Ibid. p. 67. 

16 Gerald F. Else, “Objectives and Overview.” Ibid. р. 74-75. 

17 Dunham, ibid. p. 67. 
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program is designed primarily for those pupils not likely to study 
the subject more than two years, to limit the methods used to the 
job at hand by simplification in the treatment of grammar, and to 
"discover and emphasize those objectives which promise to make 
the most sense to the greatest number of this diverse group of 
students we have." 18 


2 MEASUREMENT IN THE MODERN LANGUAGES 


Measurable elements in the modern languages 


The four immediate objectives of instruction in the foreign lan- 
guages—the development of the ability to speak, to understand, to 
read, and to write the language—are of significance to the test-maker 
in this field. They primarily determine the character of the tests to 
be used in the classroom. If these general abilities are to be measured 
effectively, the tests must be broken into many parts, each capable 
of measuring specific elements. 


Standardized test methods 


Measurement of achievement in French typically employs tech- 
niques quite similar to those used in English and in the other modern 
languages. The emphasis is definitely upon testing the pupil's fa- 
miliarity with the reading and speaking vocabularies of the language, 
the pronunciation and enunciation of words and idioms, the spelling 
of words, and certain word rules of construction with their excep- 
tions. Achievement in German is generally expressed in terms of the 
ability of the pupil to translate from the foreign language to English 
and the reverse, and his abilities in grammar and syntax. Knowledge 
of vocabulary and grammar, and the ability to read and write the 
language are the chief elements of achievement in Spanish which are 
now capable of measurement. 

In this chapter, as in most of the other chapters on measurement 
and evaluation in subject areas, the authors have chosen to present 
only a few sample items from various tests to illustrate the applica- 
tion of different objective testing methods. To conserve space, direc- 
tions to the pupils are not given for the sample items except in 
instances of unusually complex item forms. The student should be 
sufficiently familiar with various item forms and their modifications, 


18 Harold B. Dunkel, "Changing Latin in a Changing World." Ibid. p. 71-73. 
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as presented in Chapter 7, that his interpretation of the samples given 
here should not be affected by the absence of such directions. 

Simple recall and completion items. Recall forms of objective 
items, less widely used in recently published tests than the multiple- 
choice form but nevertheless very satisfactory in the foreign lan- 
guages, are illustrated in Samples A and B by a simple recall form 
and a sentence completion form. In Sample B, the English word pre- 
ceding each blank is to be translated into the foreign language by 
the pupil in filling the blank. 


Sample A.!'? 


39. Who wrote Les trois Mousquetaires? 
40. Who wrote Les Misérables? 


Sample B.?° 


тотеп (уаз) ЗЕ ОО aree es recht schön. 
48. Die Frau öffnete den Brief und las (it) ........ Т.е 
Ae Karluwent): Mf з n set eoe nach Hause. 


Multiple-choice items. 'This item form is used very widely and in 
many different adaptations in the foreign languages, both modern 
and ancient. Samples C to H present illustrations concerned with 
cultural backgrounds of the people, vocabulary, word forms, trans- 
lation, and reading comprehension. Sample H gives only one of the 
several items that are based on the reading selection. Sample I, from 
a test of aural French, shows only the portion of the test on which 
the pupil responds to the sentence read by the examiner, “Désignez 
un animal domestique." 


Sample C.?' 
19. The Louvre was originally built as a 


19-1 museum. 

19-2 prison. 

19-3 court house. 

19-4 hospital. 

19-5. royal palace. а. eremo 190) 


19 Minnie M. Miller, French Life and Culture Test. Published by Bureau of Edu- 
cational Measurements, Kansas State Teachers College, Emporia, 1935. 

20 J, R. Aiken and Cora Held, First Year German Test. Published by Bureau of 
Educational Measurements, Kansas State Teachers College, Emporia, 1933. 

21 Geraldine Spaulding, Laura Towne, and Sarah W. Lorge, Cooperative French 
Test, Lower Level, Form S. Published by Cooperative Test Service, 1942. 
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Sample D.?? 


21. estampa 
21-1 horno 
21-2 clavo 
21-3 disparate 
21-4 aguja 
2199 SimDpPesiOn) A sue Nd os so BO ri ds art 2 


Sample E.** 
7. Through what streets will they pass? 


Par ( ) rues passeront-ils? 

7-1 qui 

7-2 que 

7-3 quoi 

7-4 lesquelles 

VER GUILDE. ИИА oo he D 
Sample F.?* 


т. She must sew the dress. 
І-І Sie mag das Kleid nähen. 
1-2 Sie will das Kleid nähen. 
1-3 Sie kann das Kleid nähen. 
1-4 Sie soll das Kleid nähen. 
I-5 Sie muss das Kleid nähen. 


Sample G.?° 


9. Los nifios quieren que la madre les lea 
9-1 unas cestas. 
9-2 unas nueces. 
9-3 una bicicleta. 
9-4 un pajarito. 
9-5 un cuento. 


?? E. Herman Hespelt, Robert H. Williams, and Geraldine Spaulding, Cooperative 
Spanish Test, Advanced, Form Q. Published by Cooperative Test Service, 1940. 

?3 Geraldine Spaulding, Laura Towne, and Sarah W. Lorge, Cooperative French 
Test, Higher Level, Form S. Published by Cooperative Test Service, 1942. 

?* Emma Popper, Alice Miller, and Lucy M. Will, Cooperative German Test, Ele- 
mentary, Form P. Published by Cooperative Test Service, 1939. 

25 Jacob Greenberg, Robert Н. Williams, and Geraldine Spaulding, Cooperative 
Spanish Test, Elementary, Form P. Published by Cooperative Test Service, 1939. 
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Sample H.** 


Du cóté nord, la ville est dominée par une colline le long da laquelle 
s'étend la route de Montargis. L'église, sur les pierres de laquelle le 
temps a jeté son riche manteau noir, car elle a été bátie au XIV* 
siécle, se dresse au bout de la petite ville. Ombragée par quelques 
arbres, et mise en relief par une petite place, cette église solitaire 
produit un effet grandiose. 


89. Du cóté nord, la colline 
89-: s'éloigne de la route. 
89-2 s'élève au-dessus de la ville. 
89-3 cache la route. 
89-4 est plus longue que la route. 
89-5 est dominée par la route. ...... е" ENS J 


Sample I.?* 

/h 
eagle 
tiger 
cat 

| alligator 
| wheel 


Matching units. Standardized tests appear to use this form of item 
relatively little, and the few adaptations of the matching type of 
item that appear in such tests are too lengthy and complicated for 
presentation here. However, modified matching sets appear in some 
of the Cooperative Achievement Tests in the foreign languages. 


3 MEASUREMENT IN LATIN 


Measurement in Latin has thus far been restricted mainly to the 
immediate'objectives of the subject. Effective measurement in Latin, 
as in every other field of learning, is dependent upon the exact 
identification of the specific elements that comprise the desired out- 
comes of instruction. To attempt to measure all of the objectives of 
instruction in Latin would lead to confusion and almost certain fail- 


26 Spaulding, Towne, and Lorge, op. cit. 
27 Agnes L. Rogers and Frances M. Clarke, American Council Alpha French Test, 


„Aural Comprehension. Published by Bureau of Publications, Teachers College, Co- 
lumbia University, 1933. 
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ure. Many of these objectives emphasize cultural and disciplinary 
values which at present are not measurable. 

The analysis of achievement in Latin indicates that the accom- 
plishment of the immediate objectives in the subject is dependent 
upon a variety of specific abilities many of which are measurable. 
Among these are such factors as the acquisition of vocabulary, and 
the oral and silent reading skills. 


Achievement tests in Latin 


Numerous tests designed to measure one or more of the elements 
involved in the accomplishment of the more immediate objectives of 
instruction in Latin are now available. By far the majority of these 
tests are intended to measure general achievement rather than 
specific factors underlying achievement. Many parallel quite closely 
the testing techniques developed in the field of English. Relatively 
little has been accomplished in the development of highly analytical 
or diagnostic instruments in this field. 


Standardized test methods 


Objective item forms used in standardized tests are very similar 
in all of the foreign languages. Many of the illustrations given in a 
preceding section of this chapter for the modern languages could 
equally well apply to Latin. Therefore, only three illustrations, 
Samples J to L, are given here as examples of measurement tech- 
niques in Latin. Sample J, of recall form, measures knowledge of 
word meanings, and Samples K and L illustrate the multiple-choice 
item form in two different situations for the testing of compre- 
hension. 


Sample Ј.2% 

Directions. Each of the is connected with and gives us the 
Latin nouns below the Latin word English derivative 
Ch TERRE ORG we РОА ОРТ semen 2 
Gee NP GUE. Bore АК ECT EON Sedan er 4 


(toc ООо ГЫН OEE Boot dosectetur ао. 


28 Harold G. Thompson and Jacob S. Orleans, New York Latin Achievement 
Test, Test 2. Published by World Book Co.. 1927. 
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Sample К.2° 


26. I had hoped that you would come. 
Spéraveram té (——). 
26-1  venires 
26-2  vénisse 
26-3  ventürum.esse 
26-4 ventum sis 


26-5 ўеш ден ramo coe e anu 12826( ^ X) 
Sample L.*? 
3. Aliis nationibus superatis Belgae verebantur ne Romani (——). 


3-1 in hostes eorum impetum facerent 

3-2 eis praedam amplam donarent 

3-3 sein Italiam reciperent 

3-4 contra eos adducerentur 

3-5 е Gallia profecto excederent. мес opere er nns EIC 


4 APTITUDE AND PROGNOSTIC TESTS IN THE FOREIGN LANGUAGES 


The distinction between the closely similar aptitude and prog- 
nostic tests was pointed out in Chapter 3. Aptitude tests are es- 
sentially tests of specific intelligence, or of intellectual abilities 
influenced only slightly by training, which apply to the particular 
subject or ability for which aptitude is being measured. Prognostic 
tests are achievement measures that draw largely upon the abilities 
essential for success in a particular subject and that admittedly deal 
with abilities the pupil may have learned specifically or acquired 
incidentally. As the two types of tests may both be used in the 
determination of a pupil's readiness for the study of a particular 
subject, they are considered simultaneously in this section of the 


chapter. 


Predictive tests in the foreign languages 


It is probable that a large proportion of high-school failures in 
foreign languages almost certainly could be avoided if an adequate 


29 Harold V. King and Geraldine Spaulding, Cooperative Latin Test, Lower 


Level, Form S. Published by Cooperative Test Service, 1942. 
30 Harold V. King and Geraldine Spaulding, Cooperative Latin Test, Higher 


Level, Form S. Published by Cooperative Test Service, 1942. 
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system of guidance were employed to direct pupils out of the field 
when the essential aptitudes are missing. A few tests capable of yield- 
ing the necessary type of prognostic information for giving accurate 
guidance in the study of the foreign languages are now available, but 
much more attention should be given to the problem of determining 
which students are capable of studying foreign languages with profit. 
A pupil likely to be successful in the study of foreign languages must 
possess certain aptitudes for the learning of languages as well as the 
ability to see abstract relations. Pupils possessing these latter abilities 
will obviously secure more substantial returns from their study of 
languages than those who may be linguistically inclined but who 
have no ability to broaden their outlook. Most prognostic tests in 
the foreign languages have not provided for this important element 
in success and are therefore unable to furnish a very reliable basis 
for predicting who should study foreign languages. In common with 
most other prognostic tests, they are much more successful in point- 
ing out those who should not study the subject. The negative aspect 
of prognostic measurement is one of its most discouraging features. 

The Foreign Language Aptitude Test of the Iowa Placement Ex- 
aminations undertakes to furnish a prognostic measure by sampling 
into four types of skills found to be highly related to success in the 
study of the foreign languages. A unique feature of this test is the 
use of Esperanto as the unfamiliar situation in which to test the 
student's ability to acquire new language skills. 

The Luria-Orleans Modern Language Prognosis Test is designed 
to measure the ability of pupils to learn French, Spanish, or Italian. 
It is intended to be used before study of the language is begun. The 
results are useful as a basis for advising pupils whether to begin a 
foreign language, as a guide in classifying students, and as an aid in 
determining the effectiveness of instruction. The test contains а 
number of simple lessons with a test on each, covering certain funda- 
mental principles and skills involved in learning a romance language. 
The Orleans-Solomon Latin Prognosis Test is similarly used for 
predicting success in Latin. The Wilkins Prognosis Test in Modern 
Languages, the Symonds Foreign Language Prognosis Test, and the 
Language Aptitude Test of the George Washington University Series 
are other instruments that serve similar purposes. 
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5 DIAGNOSIS AND REMEDIATION IN THE FOREIGN LANGUAGES 


Remedial work in foreign languages 


The problems of diagnosis and remedy in the modern foreign 
languages and Latin are essentially the same. Reading, pronunciation 
(accent), vocabulary (idiom), and grammar constitute the major 
remedial areas. Naturally, a careful diagnosis of possible causes of 
difficulty must be made. The results of intelligence tests will be help- 
ful in locating general causes of trouble. Prognostic and aptitude 
tests will also be useful in identifying students who apparently lack 
the special abilities upon which successful work in the foreign lan- 
guage depends. 

Reading tests of the work-study type will do much to reveal read- 
ing deficiencies. Such reading tests may be of much the same char- 
acter as those for silent reading in English. Such tests should be 
analytical within the limits of the subject, measuring such qualities 
as reading comprehension at the word level, comprehension in sen- 
tences and in larger units, reading rate, use of books, and methods 
of study. Informal objective tests in vocabulary, grammar, com- 
position, and reading and comprehension will also be useful. 

The analysis of causes of pupil failure is indispensable to remedial 
work. The results of such an analysis may readily provide the basis 
for preventive work designed to reduce the number of failures. Low 
mentality, large pupil turnover, bad health conditions, poor study 
conditions, ineffective methods of study, lack of cooperation between 
teacher and pupil, and classes that are too large or too heterogeneous 
are certainly some of the factors that necessitate an effective reme- 
dial program. Without doubt one very potent reason for failure in 
the foreign languages is the lack of suitable instruction in effective 
methods of study. Another important factor may be the failure of 
teachers to understand the varying interests, needs, and future in- 
tentions of their pupils, and to adjust their courses accordingly. 


Topics for Discussion 
How do the foreign languages differ from the English language arts 


as they are taught in American schools? / 
How do direct, concomitant, and associate outcomes differ from 


each other in the modern foreign languages? 


I. 
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3. What important changes of emphasis have taken place recently 
in the objectives of modern foreign language teaching? 

4. What are some of the recent developments that have contributed 
to changes in the objectives of the modern foreign languages? 

5. Of what significance is the study of English as a foreign language? 

6. What is the place of modern foreign language teaching in the 
secondary school? In the elementary school? 

7. What distinguishes immediate from ultimate objectives of instruc- 
tion in Latin? 

8. In what respects do the instructional objectives of the modern 
foreign languages and of Latin differ? 

9. For what major reason have the objectives of instruction in Latin 
been modified recently? 

то. Point out the differences in the types of measurement techniques 
that are used in the foreign languages and in English. 

11. What is the place of aural-oral testing in the modern foreign lan- 
guages? 

12. What is the value of prognostic testing in the foreign languages? 

13. In your judgment what are the greatest weaknesses of foreign 
language tests? 

14. What are some of the major causes of pupil failure in foreign 
language courses? 
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Measuring and Evaluating in the 
Social Studies 


THIS CHAPTER presents a summary of the following points in the 
improvement of measurement and instruction in the social studies: 


Nature and organization of the social studies. 
Objectives and outcomes of the social studies. 

Kinds of tests in the social studies. 

Standardized tests in history, civics, and geography. 
Classroom testing and evaluating in the social studies. 


By om > 


The social studies deal primarily with past and current problems 
of human relationships and with the interactions of human beings 
as they associate with one another in varied political, economic, and 
social activities. Such school subjects as history, geography, civics, 
sociology, economics, and problems of democracy are included in this 
area. Carr and his colleagues stated that the social sciences “are those 
bodies of scholarly materials which deal with human relationships,” 
and that the social studies “are those portions of the social sciences 
which have been selected for instructional purposes.” * 

Two other terms in the area of the social sciences have recently 
come into use. Social learning is conceived by Moffatt and Howell to 
be broader than the social studies and to include “the social growth 
and development of the child as achieved through his total ex- 
periences," whereas they indicated that social education, sometimes 


1 Edwin В. Carr and others, “Social Studies.” Encyclopedia of Educational Re- 
Search, Revised edition. Macmillan Co., New York, 1950. p. 1214. 
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used as a synonym for social studies, "applies to all those activities 
that contribute to the child's social learning." ? The broadened con- 
cept of the social studies embodied in these statements is reflected 
in portions of this chapter. 


] scope or SOCIAL STUDIES · 


The formulation of definite objectives in the social studies is a 
major problem, for research techniques useful in the establishment 
of objectives in such skill areas as arithmetic, reading, language, 
and spelling are difficult to apply in this area. There are, in fact, 
probably no scientifically established objectives for the social studies, 
which remain, in contrast to the areas emphasizing skills, a field in 
which content occupies a central position. This is still true even 
though modern social studies instruction places much more emphasis 
upon social skills than do more traditional methods. 


Objectives of the social studies 


The rather detailed list given below specifies the understandings, 
attitudes, and skills or abilities that social studies instruction should 
develop in pupils.? 


I. Understandings 


a. Of the democratic faith and its meaning for human welfare and 
happiness 

b. Of the application of democratic faith in the development of the 
American heritage 

c. Of the forces which have made for world interdependence and the 
need for world organization 

d. Of the historical and geographic reasons for the behavior of 
regional and national groups 

e. Of the local community and its problems, and the need for wide 
participation in community concerns by all citizens 

f. Of the significance in social problems of the mental health and 
emotional balance of individual human beings 


? Maurice P. Moffatt and Hazel W. Howell, Elementary Social Studies Instruction. 
Longmans, Green and Co., New York, 1952. p. 12. 

3 Scope and Sequence of the Social Studies Program. Wisconsin Cooperative 
Education Planning Program, Bulletin. No. 14. State Department of Public In- 
struction, Madison, Wis., November 1947. p. 6-7. 
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2. Attitudes 

a. That all human beings regardless of race, national origin, color, 
or any matter over which they may have no control are entitled 
to equal rights to life, liberty, and the pursuit of happiness 

b. That we concern ourselves with achieving and improving human 
welfare and democratic liberties everywhere in the world 

c. That all citizens should participate actively in working toward 
the solution of community problems for social betterment 

d. That reflective group thinking can serve as an approach toward 
the solution of social problems 


3. Skills and/or abilities 
a. The ability to take part in group discussion 
b. The ability to take part in group planning 
c. The ability to think reflectively on social problems 
d. The ability to search out and use valid and adequate sources of 


information 

e. The ability to evaluate ideas and opinions on controversial prob- 
lems offered by and through radio, movies, newspapers, peri- 
odicals, books, etc. 


The student should note that these objectives are listed as under- 
standings, attitudes, skills, and abilities. The best modern thinking 
in the social studies results in objectives of this definite type rather 
than in the indefinite and vague objectives that are often listed 
even today. 


Organization of the social studies 


The question of whether to organize the social studies according 
to the traditional subject divisions, to integrate the various specific 
subjects into a unified course, or even to integrate the social studies 
and other areas of knowledge into a core curriculum has received 
much attention from students in this field. Unified courses are based 
оп the theory that the best way to prepare pupils to meet the 
problems they must face in life is to disregard subject divisions and 
to assemble materials from all sources possible. The core curriculum 
goes still farther in that it completely ignores traditional subject 
boundaries. 

Believers in the unified course ignore history, geography, civics, 
and economics as separate subjects and embody material from all of 
them in a single course. The core curriculum embodies the concept 
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of social education and emphasizes social learning in attaining its 
goals. There has been a strong tendency toward an integration of the 
social studies, especially in the elementary grades, and the tendency 
has even extended in some degree to the junior- and senior-high- 
School grades where subject lines have usually been quite clearly 
drawn. ) 

Although the results of comprehensive investigations show a trend 
toward unification of the social studies in the secondary school and 
the course in problems of democracy represents a partial integration 
of content in this area, many schools continue to teach subject matter 
as traditionally organized. Since testing necessarily lags behind the 
development of the curriculum, there will be a real need for stand- 
ardized tests and other evaluative devices in the newly organized 
instruction in this field as classroom practices change. 


2 OUTCOMES OF SOCIAL STUDIES 


General outcomes of the social studies 


It is important that instructional objectives be restated as outcomes 
in terms of the behaviors developed in pupils. The teacher is better 
able to measure and evaluate pupil growth in a subject area as 
complex as the social studies through an understanding of such 
outcomes than through an understanding of instructional objectives 
alone. 

Wrightstone and Campbell listed six characteristics a pupil should 
possess if he is to become an effective citizen in a democratic society.* 
Their statements, in the form of behavioral outcomes, are that the 
pupil : 


т. Is an individual who is motivated by democratic attitudes and be- 
liefs. 

2. Is interested in and sensitive to the problems of the community and 

of the nation in which he lives. 

Develops powers of critical and objective thinking. 

4. Has suitable work and study skills for acquiring new information and 
knowledge. 


w 


4 J. Wayne Wrightstone and Doak S. Campbell, Social Studies and the American 
Way of Life. Row, Peterson and Co., Evanston, Ill., 1942. p. 38-39. 
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s. Has historical perspectives and concepts and information so that he 
can make a balanced appraisal of current events, movements, and 
thought in relation to events which have occurred in the past. 

6. Is able to adapt himself to the personal and social conditions which 
surround and confront him. 


Specific outcomes of the social studies 


Lists of outcomes such as that given above are still a step removed 
from the needs of the teacher. Outcomes must be made specific and 
recognizable to the teacher in terms of definite pupil behavior. The 
classification outlined below is by Anderson, Forsyth, and Morse.5 


A. Acquiring Functional Information. 
т. Understanding special vocabulary. 
2. Understanding chronological relationships. 
3. Understanding maps. 
4. Understanding graphs and tables. 


B. Analyzing Social Problems. 

Knowledge of important concepts, generalizations, and findings. 
Locating, selecting, organizing, and evaluating information. 
Drawing conclusions and stating them effectively. 

Applying social facts, generalizations, and value principles to 
new problems. 


C. Practicing Desirable Social Relationships. 
т. Understanding and developing values consistent with the demo- 


cratic way of life. 

2. Understanding the social implications of specific facts and types 
of behavior. 

3. Applying democratic values...in judging the desirability of 
policies and courses of action. 

4. Understanding the importance of social action to further the 
solution of social problems, and being willing to take such action. 


+ ә ы н 


Although the authors of these outcomes outlined methods of 
measuring and evaluating such behaviors," space in this volume 


5 Howard R. Anderson, Elaine Forsyth, and Horace T. Morse, *The Measurement 
of Understanding in the Social Studies." The Measurement of Understanding, Forty- 
Fifth Yearbook of the National Society for the Study of Education, Part I. Uni- 
versity of Chicago Press, Chicago, 1946. р. 72-80. Quoted by permission of the Society. 

6 Ibid. p. 80-101. 
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permits only a listing of the outcomes and illustrations of tests and 
techniques designed to measure some characteristics of these and 
closely similar types. 


Problems of measuring outcomes 


The difficulty of measuring the outcomes of the social studies is 
great. Thus far, apparently, there has been too little careful analysis 
of the several subjects into the desired knowledges, skills, concepts, 
understandings, interests, and attitudes to permit exacting curriculum 
and test construction. Two crucial problems relating to this deficiency 
are discussed briefly here. 

The importance of factual knowledges in the social studies is 
still a moot question. The modern emphasis upon the development 
of skills involved in social living and upon the development of work- 
study skills by the use of which pupils can locate factual knowledges 
as needed has in part resolved the conflict. Modern schools tend to 
emphasize carefully selected, functional facts directly useful in the 
solution of common social problems and to stress concepts, under- 
standings, and abilities to apply facts in problem-solving rather than 
to teach large numbers of facts indiscriminately. The selection of 
the facts to teach and which ones to teach as exact and as approxi- 
mate knowledges continue to be major problems, however. Conse- 
quently, the particular facts to test and the degree of knowledge to be 
expected of pupils continue to be measurement problems. 

A further problem exists in the area of the skills necessary for 
effective social living. Many of them cannot be measured directly 
in the behavior of the school child because the pertinent behaviors 
are not often evidenced in the school. They are often revealed in the 
pupils’ out-of-school life and are therefore not subject to direct 
measurement. They may even be of types for which only the adult 
behavior of the present school child is the true criterion. Conse- 
quently, the problem of measurement and evaluation is the degree 
to which present school behavior is representative of out-of-school 
and even adult behavior and the degree to which results from social 
attitudes tests accurately represent social actions. These questions 
have not as yet been answered satisfactorily. 
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3 STANDARDIZED SOCIAL STUDIES TESTS 


Kinds of tests in history, civics, and geography 


The selection of the basic facts to be taught and tested is one of 
the very serious problems of measurement in the social studies. The 
available body of facts in geography, history, and civics is large and 
the rapid pace of events today results in constant and great in- 
creases in the content of these fields. It is not so much the need for 
knowledge of the array of facts as it is the determination of those 
likely to last long enough in a rapidly changing world to deserve 
special emphasis in instruction and in testing that complicates the 
problem. In their efforts to meet the problem of which facts to teach 
and test, most workers in these felds have made their courses of 
study and their tests more and more comprehensive, hoping thereby 
to satisfy the ideas of all concerning the basic items. Too often this 
has led both teacher and pupils to emphasize mere memorization of 
extensive catalogs of facts. As a result, these facts are too frequently 
mastered merely as facts, and not in order that they may give the 
pupil a better understanding of life and human relationships. 

The real deficiency in existing tests in the social studies, however, 
is not that they are designed primarily to measure the informational 
aspects of the subject but that other abilities more important to the 
attainment of the larger objectives of social studies instruction have 
not been provided for. Unfortunately, when a standardized test is 
used the particular outcomes it measures tend to be given special 
attention by both teacher and pupils. As a result, important objectives 
other than those emphasized in the tests are likely to be neglected. 

The majority of the tests available at present in the social science 
fields are of doubtful value for diagnostic purposes. Three general 
groups of tests in the social studies may be identified: (т) tests of 
facts and information, (2) tests of ability to solve social problems, 
and (3) tests of civic, social, and economic attitudes. 

Factual tests. Tests of facts and information are by far the most 
numerous of the tests in the social studies. This is to be expected, for 
the pupil’s knowledge of certain facts or items of information is 
quite easily discovered. Furthermore, teachers of the social studies 
have tended to emphasize the acquisition of facts and information 
to the practical exclusion of other desirable general outcomes of 
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instruction. Factual tests are of limited value for diagnostic purposes. 
They fail to reveal why pupils do not know the facts if they have 
not been acquired. The factual tests do not aid the teacher very 
significantly in discovering the ability of pupils to use facts in their 
thinking in the social science fields. 

Problem-solving or thought tests. The development of the ability 
to utilize facts and basic principles in the attack on a novel social 
situation is one of the basic outcomes of teaching in the social studies. 
This type of problem-solving duplicates the steps in the ordinary 
process of thought. As in arithmetic, problem-solving in the social 
studies involves reading the problem to comprehend it, picking out 
the facts that are pertinent to the problem, choosing a method of 
solution, and testing the results for accuracy and probability. 

It is well recognized that knowledge of the facts necessary for the 
solution of a problem is no guarantee that the problem will be 
solved, nor can a.problem be solved unless the necessary facts are 
available. However, availability of facts in this day of widely avail- 
able library facilities does not depend only upon a knowledge of 
them by their prospective user. Many of the tests for various types 
of problem-solving abilities present the necessary facts to the pupils 
in the test so that the result will depend upon their abilities so to use 
the facts that they are able to solve the problems. 

Attitudes inventories. Since actions depend to such a large degree 
upon attitudes and emotional reactions, the measurement of attitudes 
resulting from instruction in the social sciences is as greatly needed 
as are tests of ability to solve problems. As a matter of fact, much 
attention is now being given in school to the development of the de- 
sirable traits of citizenship which are so much needed in later adult life. 
The attitudes inventories available are in the main much better 
adapted. to the secondary- than to the elementary-school level. 


Standardized tests in social studies subjects 


Most of the currently available standardized tests in history, 
civics and government, and geography were published some years 
ago. It is largely in the form of a few tests for general social studies 
and the social studies parts of achievement test batteries that new 
standardized tests have appeared for this field, although several new 
tests for certain high-school courses have been published recently. 
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History. Standardized history tests for the junior- and senior- 
high-school grades are for American, world, medieval, and even 
ancient history. The major emphasis of most of these tests is upon 
factual knowledges, although some of them satisfactorily measure 
the more complex and significant results of instruction requiring 
various applications and interpretations of factual data. 

Civics and government. Standardized tests in the field of civics and 
government are limited in number. In general, measurement here is 
as satisfactory as could be expected under the changing conditions 
now existing in the social studies. However, there is need for tests 
that attack important citizenship problems in a more positive and 
realistic manner than do most of the standardized tests in civics now 
available. 

Geography. Many tests are available in geography, but most of 
these are of the formal factual type. Few of the tests take into account 
the problem-solving aspects of social studies instruction. The major- 
ity of standardized tests in geography attack the subject as a study 
of places and their characteristics, whereas the modern approach to 
the study of geography has come largely to be founded upon the 
manner in which geographical factors influence human beings and 
the societies they establish. 


Standardized tests in general social studies 


Makers of standardized tests have recently been developing tests 
in the social studies at the junior-high-school and even the elemen- 
tary-school level to meet the needs of schools that may be offering 
the unified type of social studies course discussed in a preceding 
section of this chapter. Tests of this type are not uncommon for the 
high school, but most of the social studies tests for the elementary- 
and junior-high-school grades have been for particular courses until 
the last few years. These tests include material from history, from 
civics and government, and from geography, but subject-matter lines 
are broken down. Some of them include content from related sciences 
as well as from the social studies. 

Tests that measure broadly over the social studies must almost 
of necessity avoid some of the weaknesses of tests in particular sub- 
jects because of their lack of concern for divisions within the field. 
Furthermore, the few tests of this type are relatively new, and conse- 
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quently have the advantage of being constructed with regard for 
recent thinking and experimentation with tests. Factual knowledges 
are less stressed and greater emphasis is placed upon relationships, 
applications, interpretations, and other reasoned uses of facts than 
is true on the average of standardized tests for particular courses. 


Interpretive tests in the social studies 


The types of tests discussed and ilustrated in Chapter 9 as 
interpretive in nature are more widely available for the senior high 
School than for the elementary grades and junior high school. They 
are not dealt with in this and other chapters on measurement and 
evaluation in subject areas because the authors classify them among 
the evaluation instruments, tools, and techniques to which Chapter 
9 is devoted. Such tests typically cut across the lines of demarcation 
between subject areas, so that a teacher interested in the measure- 
ment of direct outcomes from a course in American history or from 
a unified course in the social studies would not find them valid for 
his purposes. However, they measure broad functional outcomes of 
а type often sought by teachers of social studies and therefore 
warrant the careful attention of teachers of this subject field. 


Standardized test methods 


The practice of presenting illustrative types of objective items to 
familiarize the student with representative measurement techniques 
is followed in this chapter. A desirable degree of knowledge on the 
part of the student concerning specific standardized tests could not 
be assured in the brief treatment possible here, however, The student 
can gain such knowledge of particular tests only by examining them 
critically or even administering them to groups of pupils under 
standard conditions. 

A few sample items representative of test item techniques used 
by makers of standardized history, civics, problems of democracy, 
and general social studies tests are given in this section. These should 
serve the double purpose of acquainting students and teachers with 
standardized testing techniques and of Suggesting to them types of 
test items and exercises they can construct for their own informal 
objective tests.’ 


7 See also Anderson, Forsyth, and Morse, ор. cit. Chapter 5. 
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Simple recall items. 'The simple recall item is not widely used in 
standardized social studies tests. Only one sample of this form is 
presented, for differences among various recall items exist mainly in 
minor details. The sample is of the basic simple recall form. 


Sample A.* 


41 The successor of McKinley to the presidency 
was named Е ER (41) 


42 The battle cry of the Texan Army was “Re- 
member ‘the! и SUN (42) 


43 The ship in which Henry Hudson first sailed 
up the Hudson River was called the .... (43) 


Alternate-response items. Alternate-response items of the true-false 
or yes-no variety occasionally occur in secondary social studies tests, 
although other forms are probably now more common. The illus- 
tration is of an adaptation of the alternate-response item. 


Sample В.° 


Questions 76 through 80 are descriptive of some governments. On your 
answer sheet, mark the number of your answer according to the KEY 
below. 


КЕ | т. If it describes a democratic form of government. 
е da qu it describes a totalitarian form of government. 


76. State is supreme 

77. Freedom of speech 
78. Censorship of press 
79. One political party 


80. Respect for rights of minorities 


8M. Н. DeGraff, G. M. Ruch, and Н. A. Greene, Jowa General Information 
Test in American History, Grades 7-12. Published by Bureau of Educational Re- 


Search and Service, University of Iowa, 1927. К > 
? Stanley E. Dimond and Elmer F. Pflieger, Dimond-Pflieger Problems of Democ- 


racy Test. Published by World Book Co., 1952. 
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Multiple-choice items. The multiple-choice item is the most popu- 
lar for testing purposes in the secondary social studies subjects. 
The illustrations given below are of the simple item form, of a 
variation used to test knowledge of vocabulary, and of a variation 
based on a map. 


Sample С.20 


70. What was the principal work done on a 
medieval manor? 
70-1 Defending the castle against attack. 
70-2 Buying írom and selling to cara- 
vans, 
70-3 Manufacturing shoes and cloth, 
70-4 Farming. 
70-5 Copying ancient manuscripts. ....70( ) 


71. The Greek city-states never united, largely 
because of 
71-1 geographic barriers, 
71-2 different languages spoken in dif- 
ferent cities, 
71-3 religious differences. 
71-4 conflicting forms of government, 
71-5 opposition from Persia. ......... JA( ) 


Sample D.?* 


46 


Substitute — 1food 2 clothing 
3 substance 4 replace 

47. erosion. 1 proposition 2 alliance 

? disintegration 4 concession 

48. nullify 1 pacify ? return 

3 cancel 4 repeat 

49. humidity 1 cupidity ? disposition 

moisture — secretion — . — 49 

50. league 1 disease ? alliance 
Sdeparture 4 pattern 


— 46 


4T 


LAS 


——50 


10 Harry D. Berg and Elaine Forsyth, Cooperative Social Studies Test for Grades 
7, 8, and 9. Published by Cooperative Test Service, 1947. 

11 Ernest W. Tiegs and Willis W. Clark, California Reading Test, Reading Vo- 
cabulary. in Social Science, Advanced. Published by, California Test Bureau, 1950. 
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Sample E.!? 


Drrections. Questions 51 through 58 are based on the map above. For 
each question there are five possible answers. You are to decide which 
answer is correct; then mark the corresponding space on your answer 
sheet. 


51. The western terminal of the Erie Canal is represented on the map 
by— 
a: b. 2 c. 6 
ditm e. none of the above 


52. America's leading seaport is represented on the map by— 


а. 2 bas e: 7 
d. 19 e. none of the above 


Matching exercises. Matching exercises are less popular than 
multiple-choice item forms for the testing of achievement in secon- 
dary social studies courses. Balanced matching exercises are some- 
times used, but exercises based on the multiple use of one category 
of items and on graphs or maps are probably more common. Tllus- 
trations are given below for two of these types. 


1? Ryland W. Crary, Crary American History Test. Published by World Book 
Co., 1950. 
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Sample F (Exercise shown only in part).? 


Champlain (1) 36. Pioneered in Kentucky ............ ( ) 
James Wolfe (2) 37. Captured Fort Ticonderoga ........ C 
Ethan Allen (3) 38. Helped to settle Jamestown ........ (3 
John Winthrop (4) 39. Famous missionary to the Indians... ( ) 


Sample G (Exercise shown only in рагі) 1* 


DIRECTIONS. After each event in the list below put the number— 


1 if it happened before the Settling of Jamestown in 1607. 

2 if it happened between the Settling of Jamestown and the Adoption 

of the Constitution in 1787. 

if it happened between the Adoption of the Constitution and the Civil 

War in 1861-65. 

4 if it happened between the Civil War and the Spanish-American War 
in 1898. 

5 if it happened since the Spanish-American War. 


Co 


For example, you should write the number т after * Columbus dis- 
covered America," because it happened before the Settling of 
Jamestown. 


SAMPLE. Columbus discovered America ................. (e) 
42. Columbus sailed from Europe to find a route to'Indias ...( ) 
43. Plastics came into wide use ........ 000a. (se 
44. The first airplane trip over the Atlantic was made ....... (05) 
45. New York was first settled by the Dutch ............... G yes 
46. The airplane was invented ........................... (se 


4 CLASSROOM TESTING AND EVALUATING IN SOCIAL STUDIES 


Informal objective tests in the social studies 


More attention has been given in the educational literature to 
informal objective testing methods for the social studies of the high- 
School level than of the elementary-school level, except possibly 


1% Е. С. Denny and М. J. Nelson, Denny-Nelson American History Test, Grades 
7 and 8. Published by World Book Co., 1928. 


14 Richard D. Allen and others, Metropolitan Achievement Tests, Test 7, Social 
Studies: History, Advanced. Published by World Book Co., 1950. 
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for geography. Several sources of items for informal objective tests 
and guides for constructing classroom tests are listed at the end of 
this chapter. 

Two means of evaluating instructional outcomes of the social 
studies informally are open to the teacher: (1) the construction of 
informal objective tests, and (2) the use of other evaluative pro- 
cedures, Illustrations and discussions of item types in the preceding 
section of this chapter should aid the teacher in constructing objective 
classroom tests. The program of evaluation quoted below deals 
largely with devices of a non-test nature. 


An evaluation program 


A comprehensive program for the evaluation of the instructional 
outcomes of the social studies is given below for its value in 
suggesting a variety of suitable measurement techniques to supple- 
ment paper-and-pencil tests. Wesley pointed out that although most 
of the suggested techniques are objective, materials for all of them 
are not in existence. A challenge is thereby presented to the teacher 
to devise his own evaluation instruments in such cases. The first 
five points of Wesley's program dealt with evaluation of the school 
program and of the teacher, so only those statements dealing with 
evaluation of the pupil are reproduced here. 


A Procram OF EVALUATION !5 


Entity Elements and Techniques of Evaluation 


6. Concepts Ability to give examples; ability to choose correct 
or best definition; ability to give a definition ; 
ability to match with an example or a definition; 
presence or absence of the word in the pupil’s 
speaking and writing vocabulary; pupil-made 
lists of synonyms, of words by categories, and 
of other assigned patterns of words; measure- 
ment of the number of connotations which pupils 


know. 
7. Locating Observation of pupils in search of materials; tests 
Materials of familiarity with selected books, references, 


and bibliographies; time tests of skill in using 


15 Edgar B. Wesley, Teaching Social Studies in High Schools, Third edition. D. С 


Heath and Co., Boston, 1950. p. 536-38. 
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Entity 


8. Appraising 
Materials 


9. Studying 
Materials 


10. Utilizing 
Materials 


A Procram oF EVALUATION (cont.) 


Elements and Techniques of Evaluation 


index, table of contents, title page, and card cata- 
logue and in finding words in dictionary and 
articles in encyclopedia; tests of discrimination 
in choice of sources for finding answers to given 
questions. 


Ability to distinguish between sources and second- 


ary accounts, to sense degrees of reliability, to 
sense degrees of probability; ability to recognize 
authorities; lists of books read, shows attended, 
radio programs selected, lectures attended; tests 
involving attitude toward superstitions; tests for 
sensitivity to inconsistencies; ability to distin- 
guish fact from opinion; the degree of difficulty 
in proving different kinds of statements; tests 
involving the recognition of the tentative nature 
of conclusions and generalizations in the social 
studies; awareness of conflicting testimony; abil- 
ity to select kinds of data needed for a particu- 
lar problem; ability to suspend judgment. 


Ability to select leading ideas; recognition of sym- 


bols, abbreviations, and allusions; kinds of note- 
books kept; speed of note-taking and quality of 
notes; quality of outlines, abstracts, and sum- 
maries; method and speed of locating a place on 
a map; ability to select the right kind of map for 
a given purpose; knowledge of the function of 
colors in maps; familiarity with longitude and 
latitude; completion exercises in map-reading ; 


exercises in interpreting cartoons, graphs, and 
tables. 


Ability to select the proper deduction following a 


generalization; ability to make a logical infer- 
ence, draw a proper conclusion, state a generali- 
zation; ability to make correct citations and 
bibliographies; ability to organize materials; 
ability to recognize Sequences, to establish causal 
relationships; ability to select proper kind of 
graph to embody given materials; ability to put 
а group of headings in proper relationship. 
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Entity Elements and Techniques of Evaluation 


ir. Interests Observation of choice of books from a varied assort- 
ment; observations of those portions of a news- 
paper which are being read after two minutes; 
observations of subjects of magazine articles be- 
ing read after five minutes; the content of pupil 
conversations; choice of projects and problems; 
games played; questionnaires; shows attended; 
record of hobbies; radio programs heard. 


12. Cooperation Check-lists of instances of voluntary cooperation; 
check-lists with graded levels for indicating the 
quality of cooperation; lists of achievements 
which are the result of joint enterprises; the 
number and efficacy of typical student-managed 
organizations; check-lists of observance of cour- 
teous demeanor; tests of attitude toward co- 


operation. 
13. Suspended A test consisting of sets of statements followed by 
Judgment conclusions of which some are warranted and 


others unwarranted; tests to measure the change 
of opinions after hearing a speech, seeing a show, 
reading a book; tests to see if pupils will refrain 
from forming judgments on insufficient bases. 

14. Toleration Tests on racial and religious toleration; a check-list 
of instances of favorable and unfavorable treat- 
ment of minorities, such as foreigners, Negroes, 
etc., in the school. 


5 CORRECTIVE WORK IN SOCIAL STUDIES 


Diagnosis and remedy in the social studies 


Diagnosis in the social studies is difficult because (1) the knowl- 
edges, skills, and understandings that pupils should acquire are not 
too clearly identified, and (2) if the facts to be learned were known 
accurately it would still be impossible to determine whether the 
pupil functioned in his social relationships in a desirable manner 
because of his possession of the informational elements revealed by 
a test. Diagnosis and remedy are often needed in those skills that are 
basic to successful work in the social studies. Instruction in these 
subjects requires much reading of the work-study type. Therefore, 
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pupils, in order to achieve at acceptable levels, must possess many 
of the following work-study reading skills: 


т. Knowledge of technical vocabularies employed in the social studies. 

2. Reading comprehension adequate for interpretation of social science 
content. 

3. Ability to locate material readily—use of the index, library files, 
table of contents, maps, charts, etc. 

4. Ability to outline. 

5. Ability to summarize. 


These skills are discussed in Chapter 15, along with other ways 
and means for corrective work in these important acquisitive skills. 
They are not, therefore, taken up here. 


Topics for Discussion 


т. Define the field of the social studies in such a way as to clarify the 
objectives adequately for testing purposes. 

2. Discuss the pros and cons of a unified social studies curriculum as 
contrasted with the traditional organization of the content by sub- 
jects. 

3. State four of the outcomes of instruction in the social studies. 

4. What are the three main types of social studies tests as specified in 
this chapter? 

5. In your judgment what is the relation of factual knowledges to 
ability in problem-solving in the social studies? 

6. What are the chief weaknesses in problem-solving tests and attitudes 
scales? 

7. Discuss the use of various objective test item forms in standardized 
social studies tests. 

8. Comment on some of the evaluative techniques of a non-test nature 
suggested for use in the social studies. 
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Measuring and Evaluating in 
Mathematics 


Tur FOLLOWING important points involved in the measurement and 
evaluation of junior and senior high-school mathematics are sum- 
marized in this chapter: 

A. General and social significance of mathematics. 

в. Objectives and outcomes of mathematics. 

c. Arithmetic and general mathematics in the junior high school. 

р. Measuring and evaluating in algebra. 

к. Measurement in plane and solid geometry. 

r. Aptitude and prognostic testing in mathematics. 


1 GENERAL SIGNIFICANCE OF MATHEMATICS 


Social significance of mathematics 


Arithmetic and succeeding courses in mathematics constitute the 
intellectual background of exact thinking in the sciences and in the 
solution of many problems of modern society. For this reason arith- 
metic has long been recognized as à most important tool subject. 
Many think that failure to develop numerical concepts in high-school 
pupils and to acquaint them with the rudiments of higher mathe- 
matics is to neglect a most important educational responsibility. 
Babies come into a physical world in which quantity, shape, size, 
dimension, transformation, and relationship play important roles. 
The universe is unquestionably mathematical. Whatever may be the 
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individual's background or inclinations toward mathematics, civiliza- 
tion, and sciences, his modes of living and of thinking have a mathe- 
matical core that he cannot escape. Mathematics, therefore, has a 
justifiable place in the secondary-school curriculum. If properly 
taught and enriched, it has the possibility of providing a type of 
training that is essential to every individual. It is considered by 
some to be one of the permanent bases of both a practical and a 
liberal education. 

A brief but very useful treatment of the educational and social im- 
portance of mathematics is presented in the Report of the Com- 
mission on Post-War Plans. The discussion is organized around the 
following main points. 


1. Mathematics for Personal Use 
a. Mathematics in the Home 
b. Mathematics for Citizenship 
c. Mathematics for Intelligent Reading 
d. Mathematics for Everyday Workers 


2. Mathematics Used by Trained Workers 
a. Bookkeepers 
b. Clerical Workers 
c. Craftsmen 
d. Farmers 
e. Nurses 


3. Mathematics for College Preparation 
a. Type of College 
b. Specialization 


4. Mathematics for Professional Workers 
Teacher 

Statistician 

Engineer 

Surveyor 

Accountant 

Actuary 

Medical and Other Health Services 
Scientific Research 


Бо Бо ao op 


5. Women in Mathematics 
6. Mathematics Used by Civil Service Workers 


1 “Final Report of the Commission on Post-War Plans of the National Council 
of Teachers of Mathematics.” Guidance Pamphlet in Mathematics for High School 
Students, Published by The Mathematics Teacher, Washington, D. C., November 
1947. 
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Objectives and outcomes of mathematics 


The question of how much mathematics an individual must know 
is very difficult to answer. While everyday occupations involve con- 
siderable mathematical skill in simple arithmetic utilizing whole 
numbers, fractions, and decimals, there are many other fields of 
mathematical skill which have considerable social importance. These 
are well summarized in the following check list of 29 items: ? 


т. Computation 
2. Percents 
3. Ratio 


4. Estimating 


5. Rounding 
Numbers 

6. Tables 

7. Graphs 


8. Statistics 


9. Nature of 
Measurement 


10. Use of Measuring 
Devices 


ir, Square Root 


? Ibid. p. 4. 


The Check List 


Can you add, subtract, multiply, and divide 
effectively with whole numbers, common frac- 
tions, and decimals? 


Can you use percents understandingly and 
accurately? 


Do you have a clear understanding of ratio? 


Before you perform a computation, do you 
estimate the result for the purpose of check- 
ing your answer? 


Do you know the meaning of significant num- 
bers? Can you round numbers properly? 


Can you find correct values in tables; e.g., 
interest and income tax? 

Can you read ordinary graphs; bar, line, 
circle? The graph of a formula? 

Do you know how to collect and interpret 
data; can you use averages, can you draw 
and interpret a graph? 

Do you know the nature of a measurement, of 
a standard unit, or the largest permissible 
error, of tolerance, and of the statement that 
a measure is an approximation? 


Can you use rulers, protractor, graph paper, 
tape, caliper, micrometer, and thermometer? 


Can you find the square root of a number by 
table or by division? 
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13. 


14. 


15. 


16. 


ҮЛ, 


18. 


19. 


20. 


21. 
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. Angles 


Geometric 
Concepts 


The 3-4-5 
Relation 


Constructions 


Drawings 


Vectors 


Metric System 


Conversion 


Algebraic 
Symbolism 


Formulas 


The Check List (cont.) 


Can you estimate, read, and construct an 
angle? 


Do you have an understanding of point, line, 
angle, parallel lines, perpendicular lines, tri- 
angle, parallelogram, trapezoid, circle, regular 
polygon, prism, cylinder, cone, and sphere? 


Can you use the Pythagorean relationship in 
a right triangle? 


Can you with ruler and compasses construct 
a circle, a square, and a rectangle, transfer a 
line segment and an angle, bisect a line seg- 
ment and an angle, copy a triangle, divide a 
line segment into more than two parts, draw 
a tangent to a circle, and draw a geometric 
figure to scale? 


Can you read and interpret reasonably well, 
maps, floor plans, mechanical drawings, and 
blueprints? Can you find the distance between 
two points on a map? 


Do you understand the meaning of vector, 
and can you find the resultant of two forces? 


Do you know how to use the most important 
metric units? 


In measuring length, area, volume, weight, 
time, temperature, angle, and speed, can you 
shift from one commonly used standard unit 
to another widely used standard unit; e.g., do 
you know the relation between yard and foot, 
inch and centimeter, etc.? 


Do you understand the symbolism of algebra 


—do you know the meaning of exponent and 
coefficient? 


Do you know the meaning of a formula? Can 
you write an arithmetic rule as a formula, and 
can you substitute given values in order to 
find the value for a required unknown? 


23. 


24. 


26. 


27; 


28. 


29. 


Signed Numbers 


Using the Axioms 


Practical 
Formulas 


Similar Triangles 
and Proportion 


Trigonometry 


Business 
Arithmetic 


Stretching the 
Dollar 


Proceeding from 
Hypothesis to 
Conclusion 


A critical compariso 
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Do you understand signed numbers and can 
you use them? 


Do you understand what you are doing when 
you use the axioms to change the form of a 
formula or when you find the value of an un- 
known in a simple equation? 


Do you know from memory certain widely 
used formulas relating to areas, volumes, and 
interest, and to distance, rate, and time? 


Do you understand the meaning of similar 
triangles, and do you know how to use the 
fact that in similar triangles the ratios of cor- 
responding sides are equal? Can you manage 
a proportion? 


Do you know the meaning of tangent, sine, 
cosine? Can you develop their meanings by 
means of scale drawings? 


Have you a start in understanding the keep- 
ing of a simple account, making change, and 
the arithmetic that illustrates the most com- 
mon problems of communications and every- 
day affairs? 


Do you have a basis for dealing intelligently 
with the main problems of the consumer; e.g., 
the cost of borrowing money, insurance to 
secure adequate protection against the numer- 
ous hazards of life, the wise management of 
money, and buying with a given income so 
as to get good values as regards both quan- 
tity and quality? 

Can you analyze a statement and determine 
what is assumed, and whether the suggested 
conclusions really follow from the given facts 
or assumptions? 


n of the foregoing check list and the content 


of the traditional four-year high-school course in mathematics in- 
dicates that most of these items would be encountered. Actually, 
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many of the items would be mastered in a good course in business 
arithmetic, although many would be found only in a good course 
in shop mathematics. Others would almost certainly be found in the 
newer types of courses in general mathematics and consumer mathe- 
matics. 

There is reasonable ground for the conclusion that not all persons 
should be required to study mathematics. From the standpoint of 
utility there are relatively few people aside from the specialists who 
need much if any mathematics beyond the mere rudiments of arith- 
metic. There is also some question concerning the alleged cultural 
or disciplinary values of mathematics. In any event there is an ap- 
parently increasing tendency to make all mathematical work elective 
beyond the fundamentals as taught in the elementary grades. 

Courses in general mathematics are becoming increasingly popular 
in junior high schools and a good deal is being done toward the de- 
velopment of standardized tests in the subject. In view of the very 
complete treatment of the measurement of arithmetic in the com- 
panion volume on elementary-school tests only a brief consideration 
of that subject is presented here. Algebra, general mathematics, and 
geometry receive the major instructional emphasis in the high school 
and are accordingly given the most attention in this chapter. 


2 ARITHMETIC AND GENERAL MATHEMATICS IN THE JUNIOR-SENIOR 
HIGH SCHOOL 


Basic arithmetic skills 


Arithmetic is one of the more definite tool subjects, and much of 
its content is suitably organized for teaching purposes. For years it 
has been recognized that success in addition depends on a mastery of 
the basic addition facts. The same may be said of each of the four 
fundamental processes with whole numbers. Teachers now recognize, 
however, that success in such work as long division is dependent on 
a great many more skills than are involved in the mastery of the 
basic division facts. Long division calls for the accurate use of skills 
in addition, multiplication, and subtraction, not to mention the 
skills that are usually recognized as belonging definitely to division. 
Multiplication itself may involve the basic multiplication facts, the 
addition and multiplication involved in carrying in multiplication, 
and addition itself. A partial catalog of arithmetical skills selected 
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for teaching, testing, and remedial purposes is presented here to il- 
lustrate the extent to which such an analysis may be carried as well 
as to furnish a broad basis upon which to build diagnostic and 
remedial material in this field. 


Basic ARITHMETIC SKILLS 


I. Fundamental Processes with Whole Numbers 


15. 


TW 
> юн Охо соз: слу юн 


Basic Addition Facts 

Basic Subtraction Facts 

Basic Multiplication Facts 

Basic Short Division Facts 

Higher Decade Addition 

Column Addition 

Carrying in Column Addition 

Harder Subtraction 

Borrowing or Carrying in Subtraction 

Addition Used in Harder Multiplication 

Carrying in Addition Used in Harder Multiplication 
Complete Process of Multiplication 

Short Division Involving Carrying 

Multiplication, Addition, and Subtraction Used in Long 
Division 

Complete Process of Long Division 


II. Fundamental Processes with Fractions and Whole Numbers 


Changing Fractions to Equivalent Forms 
Finding Common Denominators 

Reducing Fractions 

Addition of Fractions and Mixed Numbers 
Expressing Mixed Numbers as Improper Fractions 
Fundamentals of Subtraction of Fractions 
Reduction of Mixed Numbers 

Cancellation in the Multiplication of Fractions 
Multiplication of Fractions 

Cancellation in Division of Fractions 
Changing from Multiplication to Division Form 
Fundamentals of Division of Fractions 


III. Fundamental Processes with Decimals 


I. 
. Changing Fractions and Mixed Numbers to Decimal Form 


2 
3. 
4. 


Notation of Decimals 


Changing Decimals to Fractions and Mixed Numbers 
Fundamentals of Addition of Decimals 
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IV. 


VI. 


VII. 


VIII. 


OO омыл оол 
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Basic ARITHMETIC SKILLS (cont.) 


Fundamentals of Subtraction of Decimals 


. Pointing off in Multiplication of Decimals 
. Dividing Decimals by Pointing off 

. Location of Decimal Points in Division 

. Changing Remainders to Decimal Form 
то. 


Fundamentals of Division of Decimals 


Fundamental Processes with Denominate Numbers 


I. 


Reducing in Denominate Numbers 


2. Borrowing in Denominate Numbers 

3. Addition of Denominate Numbers 

4. Subtraction of Denominate Numbers 

5. Multiplication of Denominate Numbers 

6. Division of Denominate Numbers 
Mensuration 

1. Mensuration of Plane Surfaces 

2. Mensuration of Solids 

3. Finding Areas and Volumes 

4. Formulas Used in Mensuration 
Percentage 

т. Fractional and Per Cent Relations 

2. Decimal and Per Cent Relations 

3. Expressing Areas in Per Cents 

4. Fundamentals of Work in Percentage 
Interest 

1. Business Forms 

2. Budgets 

3. Computation of Interest 

4. Computation of Discount 

5. Use of Interest Tables 


Problem Solving 


лс ю н 


e 


Comprehension of Problem 

Knowledge of What Is Given 

Knowledge of What Is Called for 

Probable Answer 

Knowledge of Proper Processes and Proper Order of 
Processes 

Recognition of the Correct Solution 


MATHEMATICS 511 


3 MEASUREMENT IN ARITHMETIC AND GENERAL MATHEMATICS 


Standardized testing in computational skills 


Computational skills are most often tested by an item type of 
simple recall form, although multiple-choice items are sometimes 
used. Such item types can be used with any combination of the four 
fundamental operations—addition, subtraction, multiplication, and 
division—and the four types of numbers—whole numbers, mixed 
numbers, fractions, and decimals. Some tests classify all items ofa 
type together, while others use the “omnibus” arrangement of mixed 
order for the various operations and types of numbers. 

Simple recall items. Although these items are of simple recall form, 
it is by means of performing certain calculations rather than as recall 
that a pupil obtains the answers. Directions are usually given to the 
pupil concerning the form of answer desired, e.g., mixed numbers 
reduced to whole numbers and fractions, fractions reduced to lowest 
terms. Definite rules are also usually provided in order to objectify 
the scoring of a type of performance that is often viewed by different 
teachers according to very different standards. Credit is ordinarily 
given only for answers that are entirely correct. 


Sample A. 
т. Subtract 2. Subtract 3. Multiply 4. Divide 5. Add r — 
58 456 105 562-7 = 31 2; 
—37 —107 6 17 C ema 
NE ее 24 aes 
igit dues cms 


Multiple-choice items. Multiple-choice items require the pupils to 
perform the calculations in order to determine which is the correct 
answer, although there is usually no requirement that the pupil put 
down the work by which he obtained the answer. Some pupils might 
obtain the answers by mental computation and others by putting 
down only a skeleton of their computations. : 
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Sample B.* 
41 Add 407 42 Subtract 43 Multiply 
819 6002 265 
965 3918 300 
113 
41 2204 ] 2305 0 2314 L] N 
42 П 2084 E] 2114 Г] 3084 ON 


аз 0 7950 0 78,500 О 7950 ON 


Standardized testing in problem-solving 


Standardized tests in problem-solving are most frequently set up 
either in simple recall or in multiple-choice form. The five examples 
given below are sufficient to illustrate the testing method because of 
the similarity of problem-solving items in different tests. 

Simple recall items. Simple recall items in this situation require 
solutions of the problems, rather than recall in the usual sense of 
that word, in obtaining the answers. Scoring of responses is prac- 
tically always on an all-or-none basis, for no credit is given unless 
the answer is correct. Only one illustration of this item type is shown. 


Sample C.* 


1. I bought an apple for 4 cents, a bowl of soup for 8 cents, and a 
cookie for 2 cents. АП of the food cost how many cents? .......... 


2. John has 6 cents and wants to buy a ball that costs 15 cents. How 
many more cents does he need to buy the ball? 


Multiple-choice items. The multiple-choice item in problem-solv- 
ing also usually requires the solution of the problem in order to 
determine which of the alternative answers is the correct one. How- 
ever, some items require only an indication of the information neces- 
sary in a problem situation. 


? E. F. Lindquist and others, Jowa Every-Pupil Tests of Basic Skills, Test D, 
Basic Arithmetic Skills, Advanced, Form Q. Published by State University of 
Towa, 1946. 

+ Gertrude H. Hildreth, Arithmetic Achievement Tests, Grades 2 to 6. Published 
by Bureau of Publications, Teachers College, Columbia University, 1935. 
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Sample D.* 


Jean sold boxes of home-made fudge at 50¢ а box. Each box cost 30¢ 
to make and contained two dozen pieces. 


74. What was her profit on 8 boxes? 
75. What was the cost of making each piece? 
76. Her profit was what per cent of the selling price? 


74 L] $1.10 Г] $1.60 L] $4.00 OLN 
75 0 Г О 2.16 022$ ON 
76 O 40% O X 76 O 40% ON 


Scope of diagnostic testing in arithmetic 


It is not by chance that the development of diagnostic tests has 
been largely confined to subject fields in which the aims are clean cut 
and the basic skills conditioning achievement have been analyzed 
carefully. Nor is it by chance that the blanket purposes of certain 
other subject fields, as expressed in courses of study and textbooks, 
have left the teacher groping vaguely for tangible goals and effective 
instructional methods. The order of development is clear: first, there 
must be a specific statement of aims lying back of the subject; 
second, a detailed analysis must be made of the basic skills upon 
which ultimate achievement depends; and third, material designed 
to give mastery of these skills must be prepared. 

Some progress has been made in the diagnosis of pupil defects in 
the field of arithmetic. This is possible because the aims of arith- 
metic are quite clearly stated, which in turn permits a rather detailed 
analysis of the underlying skills. As soon as it became known, for 
example, that the ability to do a certain type of column addition de- 
pends on the pupil's knowledge of certain higher-decade addition 
facts, it was possible not only to locate difficulties in teaching this 
material as such but also to furnish the teacher with specific aids in 
teaching it. The reason why similar material is not available in 
geography, history, science, and some phases of mathematics is that 
the aims of instruction in these fields have not yet become sufficiently 
crystallized to permit the type of analysis to which arithmetic has 


been subjected. 


5 Lindquist, op. cit. 
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Meaning of problem-solving 


Problem-solving in mathematics and in the sciences is a compli- 
cated skill which is almost certainly highly related to general in- 
telligence. Naturally, an attempt to analyze and to identify the 
underlying skills meets with considerable difficulty. Thus far five 
fundamental steps in problem-solving, closely paralleling the steps 
in the thinking process outlined by Dewey,* have been identified. 
These steps afford practically the only workable basis for an attack 
upon problem-solving difficulties. 

The first step in the solution of verbal problems demands a com- 
plete understanding of the elements and processes that are involved 
or implied. This is comprehension. This in itself involves many fac- 
tors, such as rate of reading, vocabulary difficulties, reading of 
numerals, and problem organization, as well as complexity in terms 
of the number and order of the mathematical processes involved. 
Underlying all of these is, of course, the ability of the pupil to hold 
the various facts and conditions in his mind long enough to analyze 
and organize them. This process of analysis and organization consti- 
tutes a second important step. The unnecessary facts or implications 
are discarded and only the significant data are retained. The third 
step in practice is actually a part of the second, for the recognition 
of the process involved is really a part of analysis. From this the 
worker moves straight to the fourth Step, solution, where he applies 
to a specific situation his knowledge of the fundamental tools of 
number, In his earlier practice he has learned how to perform certain 
simple arithmetical computations. Now he learns when to apply 
them. The next and final step in the process is verification, which may 
be either a rough checking by the estimation of the probable answer 
to the problem or an actual recalculating and rechecking of the 
processes involved. 


Problem-solving exercises 


Skill in the solution of verbal problems is difficult to develop be- 
cause of its complexity. The remedial aspect of the field of problem- 
solving largely remains to be developed. Since a complete sampling 
of the complex mass of skills involved in problem-solving cannot be 


5 John Dewey, How We Think. D. C. Heath and Co., Boston, 1910. 
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made, only a few of the most important skills are included in the 
proposals of Table 34 for teacher-made remedial material. Prac- 
tice on the silent reading comprehension of verbal problems of 


TABLE 34. Analysis of problem-solving 


Steps i : 
ра Factors Underlying Proposed Typos 
solving roblem-solving of Drill 
Vocabulary. 
Ability To Read Numerals. 
Ability To Read Rapidly. 
Ability To Comprehend. 
a. Follow directions. 
b. Make generalizations. Multiple aie 
ай c. Select potent elements. omprehension. 
Сатри паше а. Discard irrelevancies. Exercises based on 
e. Determine problem setting as verbal problems. 
a unit. 
f. Determine the outcome of the 
problem. 
g. Grasp the significance of prob- 
lem cues. 
Selection of Potent Elements. What Is Given in 
Selection of Processes Involved. Problems: 

Analysis DE What the Problem Calls | process Required in 
and d arts : a Solving Problems. 
Organization DE What Is Given in the What Is Called for in 

Problem. = Problems. 
Determining the Process Relation- | Problem Relationships 
ships. 3 
Required i 
Choice of Procedure. P p Sco ra 
Determining Problem Conditions. What Is Called for in 
Recognition Determining the Purpose of the «Problems; 
Problems. What Is Given in 
Determining Relevant Elements. Problems. 
Selection of Process. Reciited 4 
Organization of Processes in Order. ee ACAD 
Solution Knowledge and Application of Com- Working on Problem 
binations. Scales. 
Problem Relationships. 
A ; Probable Form of Answer. Estimation of Probable 
Verification | probable Magnitude of Answer. Answers. 


varying degrees of com 
selection of the facts given in the pro 
solution responds to practice. Skill in compreh 
lem setting by determining 


plexity is suggested in this analysis. The 
blem that are essential to its 


ending the real prob- 


what is called for in the problem is also 
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developed by practice on this type of exercises. Practice on the basic 
skill of choosing the correct processes in the more complex problems 
is also suggested. Skill in the verification of the solution or the 
estimation of the most probable answer to the problem may be 
developed by special exercises. 


Testing for general functional competence in mathematics 


In addition to the general survey tests of arithmetical skills that 
accompany the advanced examinations of most standardized achieve- 
ment batteries, such as the New Stanford Achievement Examination 
and the Metropolitan Achievement Tests, there are not many instru- 
ments designed for the measurement of general functional com- 
petence in mathematics. The Jowa Basic-Skills Tests, Test D, 
Advanced Examination, provide an excellent sampling into certain 
of the higher levels of arithmetical skills. The Davis Test of Func- 
tional Competence in Mathematics is one of the newest and most 
comprehensive tests of this type. The content of this test is based on 
the essentials for functional competence in mathematics as recom- 
mended by the Commission on Post-War Plans of the National 
Council of Teachers of Mathematics. The two parts and four sections 
measure the following areas: 


Part I. Section A. Consumer Problems (24 items) 
Section B. Graphs and Tables (9 items) 


Part II. Section A. Symbolism, Equations (24 items) 
Section B. Ratio, Tolerance (23 items) 


The total working time for the pupil is two class periods of 4o 
minutes each. Middle-of-year and end-of-year percentile norms are 
available for Grades 9 to r2. 


Measuring and evaluating in general mathematics 


Two important factors have been operating in recent years to 
promote interest in courses in general mathematics. The first of these 
is the general reduction in the number of units of secondary-school 
mathematics required for college entrance. Perhaps this is closely 
related also to the fact that student failure in mathematics has been 
abnormally high in the secondary Schools. Analysis of guidance in- 
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formation indicates that courses in first-year algebra and plane 
geometry are too specialized for the abilities and interests of many 
high-school pupils. The development and wide use of aptitude and 
prognostic instruments in the secondary-school mathematics areas 
has demonstrated that many pupils lack the abilities required for 
success in algebra and geometry courses. However, they are able to 
do entirely satisfactory work in properly organized general mathe- 
matics courses. Some training in mathematics beyond elementary- 
school arithmetic is undoubtedly desirable. General mathematics 
courses should meet this need. 

The objectives of the course in general mathematics determine the 
nature and the content of standardized objective tests in the subject. 
General mathematics courses are primarily of two types: (1) those 
designed for the non-mathematically inclined pupil with emphasis 
on the practical applications of mathematics, and (2) those that 
sample their content from the fields of arithmetic, algebra, geometry, 
and trigonometry. The Snader General М athematics Test, one of the 
most recent tests in this area, is representative of the second type. 
Separate answer sheets of the multiple-choice type make for economy 
and speed in administering, scoring, and interpreting the tests. 


4 MEASUREMENT AND EVALUATION IN ALGEBRA 


Objectives of instruction in algebra 


Considerable attention has been given in the past few years to the 
restatement of objectives and to the reorganization of the subject 
matter of the secondary school. Mathematics, because of its cumu- 
lative character, is particularly sensitive to the reorganization of its 
content. Gradually a simplified program of instruction in algebra is 
being put into operation. Some of this simplification is due to the 
necessity of fitting the instruction to the maturity levels of the 
pupils. More of it is due, however, to the growing conviction on the 
part of teachers that the abstract symbolism of algebra has educa- 
tional significance only if it is clearly understood and is applied in 
purposeful situations. Algebra, because of these factors, is now being 
organized around a relatively small number of concepts and prac- 
tical topics. It includes the formula, the equation, and the graph, and 
as much purposeful technique as is necessary to make these topics 
effective for purposes of functional thinking and of problem-solving. 
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The special purposes of the teaching of algebra may be summarized 
as follows: * 


I. To establish arithmetical skills more adequately and to extend the 
meaning and significance of arithmetic 

2. To strengthen the pupil's power in computation, by much practice 

as well as by the development of devices useful in computation 

'To develop the ability to interpret and to construct graphs correctly 

To develop understanding of the “language” of algebra 

5. To develop the ability to understand, solve, transform, and apply 
formulae 

6. To develop knowledge of the equation and ability to apply it in the 
solution of problems of a wide range of interest, including large 
classes of problems often treated in arithmetic, as well as problems 
relative to geometry, to physics, and to other natural sciences 

7. To furnish such material within its domain as may be needed in the 
later study of mathematics and of the various physical sciences 


fo 


Measurable qualities in algebra 


An examination of the general objectives of instruction in algebra 
makes it clear that while there are numerous important specific skills, 
the general qualities. of greatest significance deal with the develop- 
ment of speed and accuracy in the fundamental processes and in 
problem-solving. While reasonable speed in performing the funda- 
mental processes is desirable, the habit of mathematical accuracy 
must be developed in the pupils if algebra is to fulfill its primary 
obligation. Speed of work, though important, ranks as secondary to 
accuracy and reasoning power. The majority of tests in algebra 
emphasize the mechanics of algebra, in which speed and accuracy 
are primary elements. Much remains to be accomplished in the ef- 
fective analysis and measurement of the problem-solving abilities. 


Standardized achievement tests in algebra 


'Two types of approach are noted among the more recently de- 
veloped standardized tests in first-year algebra. One point of view, 
illustrated by the Seattle Algebra Test and the Lankton First-Year 
Algebra Test, is based on the belief that single and relatively brief 


T Adapted primarily from: The Reorganization of Mathematics in Secondary 
Education. A Summary of the Report by the National Committee on Mathematical 


Requirements. Bureau of Education, Bulletin тозт, Ni inti 
о. 32. tin 
Office, Washington, D. C., 1922, / ^ олмыш 
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end-of-semester and end-of-course tests represent adequate measures 
for the evaluation of accomplishment in the subject. The second type 
assumes that unit-type tests standardized at the time of the com- 
pletion of instruction afford the most effective measures of accom- 
plishment. The Larson-Greene Unit Tests in First Year Algebra are 
presented here as illustrations of this approach. 

The first of the tests mentioned above consists of a sampling of 47 
items covering vocabulary, fundamental processes, equations, and 
algebraic representations and problems. The items are completely 
objective and separate answer sheets that may be scored rapidly by 
hand or by machine are used. This test is designed for use at the 
end of the first half year of algebra. The accompanying excerpt shows 
sample items representative of each of the four parts of this test. 

The second of the tests is the Lankton First-Year Algebra Test.? 
This is an end-of-course test consisting of 55 items on vocabulary of 
algebra, meaning and use of symbols, fundamental operations, for- 
mulas, equations, simple algebraic fractions, radicals, ratio, pro- 
portion, variation, graphs, trigonometric functions, and the solution 
of problems by the use of algebra. The general appearance and the 
testing techniques are quite similar to those indicated in the excerpt 
from the Seattle Algebra Test. 


Excerpt from Seattle Algebra Test ° 


Part А. Vocabulary 
т. In 3 а?с, the c is 
I. a term. 
2. a binomial. 
3. an exponent. 
4. a factor. 
5. a numerical coefficient. 


Part B. Fundamental Processes 
то. (—2)(—2)(—2) equals 


a. —8 
b. —6 
c. +6 
d. +8 


e. none of the above 


8 Robert Lankton, Lankton First-Year- Algebra Test. Published by World Book 


Co., 1950. 
9? Herold В. Jeffery and others, Seattle Algebra Test. Published by World Book 


Co., 1951. 
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Part C. Equations 


31. н = 6, then х equals 


8:5 

b. 4 

CCS 

di? 

e. none of the above 


Part D. Algebraic Representation and Problems 


Directions. In the following questions, read each problem and decide 
which of the five given algebraic expressions or equations is correct. Do 
not solve the equations. 


40. If n represents an odd number, the next higher consecutive odd 
number is 2 


2n 
n+. 
n+ 2 
un 


n 


о Вось 


The Larson-Greene Unit Tests in First-Year Algebra 1° comprise 
‘six tests in a 24-page booklet designed to cover the basic phases of 
first-year algebra. Separate quick-scorable answer sheets are pro- 
vided. The tests are standardized for periodic use in any order 
through the school term following the completion of instruction on 
the specific subject-matter units. Tests I, 2, and 3 cover the work 
normally taught in the first semester. Tests 4, 5, and 6 cover the sec- 
ond semester’s work. No one of the tests is intended to be used as a 
semester final or end-of-course examination. For the teacher the 
primary advantage of a unit test is the fact that results from the 


various parts become available to the teacher in time for remedial 
work to be undertaken, 


10 Robert Larson and Harry A. Greene, Larson-Greene Unit Tests in First-Year 
Algebra. Published by Bureau of Educational Research and Service, State Uni- 
versity of Iowa, Iowa City, 1947. 
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5 MEASUREMENT IN PLANE AND SOLID GEOMETRY 


Objectives of instruction in plane geometry 


The more important general objectives of high-school geometry 
may be summarized as follows: ** 


т. Development of logical reasoning ability 

2. Development of an appreciation of the utility and beauty of geo- 
metrical forms 

3. Familiarization of the student with the properties, mensuration, and 
relationships of common geometric forms 

4. Development of an understanding and an appreciation of deductive 
proofs 

5. Creation of an understanding of spatial concepts and relations 

6. Establishment of habits of precision and accuracy 

7. Development of an appreciation of the part geometry has played in 
the history of civilization 


Possibilities of measurement in plane geometry 


The general nature of the objectives of instruction in geometry 
makes it necessary to state these outcomes in very definite form 
before the measurable qualities in the subject can be identified. The 
accompanying summary of knowledges, skills, and abilities is at the 
same time a catalogue of those elements of plane geometry that lend 
themselves most readily to objective measurement. This list is a 
compilation based on numerous statements of objectives, textbooks, 
tests, and courses of study. 

Ability to visualize or think in terms of visual imagery 

Ability to read geometrical material, theorems, and problems 
Ability to use efficiently the basic arithmetical processes required 
Ability to apply the algebraic processes required 

Ability to state what is given in the theorem or problem 
Ability to state what is to be proved in the theorem or problem 
Ability to state in proper and logical order the essential steps in 
the proof or solution of the problem 


з сл Б э юн 


11 Edwin S. Lide, Instruction in Mathematics. National Survey of Secondary 
Education, Monograph No. 23, U. S. Office of Education Bulletin, 1932, No. 17. 
Government Printing Office, Washington, D. C., 1933. р. 47. 
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8. Ability to recall or select additional facts necessary for the proof 
or solution 

9. Ability to detect errors in reasoning in a proof or solution 

то. Ability to detect errors in the mathematics of the proof or solu- 
tion 

II. Ability to draw figures required by a proof or problem 

12. Ability to detect errors in figures required by problem 

13. Ability to do certain constructions 

14. Ability to detect errors in the steps in a construction or in the 
construction itself 

15. Ability to apply geometric principles to real life problems 


Not all of these elements of accomplishment in plane geometry 
are measured directly in any one series of tests, although many may 
appear incidentally in connection with the total process of proving 
a theorem or solving a problem. While everyone recognizes the 
importance of developing the pupil's power to reason logically from 
given data as a most important outcome in geometry, it must never- 
theless be admitted that the most effective measurement of such 
reasoning power is attained through his success in the application 
of specific abilities and skills, 


Objectives of instruction in solid geometry 


The instructional purposes of solid geometry are given expression 
in the following recommendation of the National Committee on 
Mathematical Requirements.!? 


The aim of the work in solid geometry should be to exercise further 
spatial imagination of the student and to give him both a knowledge of 
the fundamental spatial relationships and the power to work with them. 
It is felt that the work in plane geometry gives enough training in logical 
demonstration to warrant a shifting of emphasis in the work on solid 
geometry away from this aspect of the subject and in the direction of 
developing greater facility in visualizing spatial relations and figures, 


representing such figures on paper and in solving problems in mensura- 
tion. 


In general, pupils in solid Beometry courses are expected to attain 
familiarity with the fundamental formulas of plane geometry, the 
use of logarithms, and the use of natural and logarithmic functions 


1? National Committee on Mathematical Requirements, The Reorganization of 


Mathematics in Secondary Education. Mathematical Association of America, Ober- 
lin, Ohio, 1923. 
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(including sine, cosine, and tangent). Furthermore, they are gen- 
erally expected to become familiar with the solution of triangles 
and with the formulas of solid geometry. The subject is generally 
elective, and the number of pupils electing it is not as large as for 
algebra and plane geometry. Perhaps this is one of the reasons why 
relatively little attention has been given to the construction of 
standardized tests for this subject. 


Standardized testing in geometry 


As in the case of first-year algebra, there are two typical approaches 
to standardized testing in plane geometry—end-of-course achieve- 
ment tests and unit tests adjustable to the order of presentation of 
the subject-matter units. The Seattle Plane Geometry Test and the 
Shaycoft Plane Geometry Test are two recent examples of the first 


type. The Lane-Greene Unit Tests in Plane Geometry are repre- 
sentative of the second approach. 

The Shaycoft Plane Geometry Test 3* is a 6o-item booklet covering 
the objectives of the typical one-year course in plane geometry. The 
Seattle Plane Geometry Test consists of 45 items designed to cover 
the course emphasis of the first half year of the course. Table 35 
shows the subject-matter coverage and the number of items in each 


of the parts of the test. 
TABLE 35. Summary of parts and items: Seattle Plane Geometry Test ** 


Number of items 


Content 
ОТОС з. ы ык сс. 


А Vocabulary 


12 


B Construction II 
С Computations 10 
D Reasoning from a Figure 12 


The Lane-Greene Unit Tests in Plane Geometry consist of six tests 
standardized for use in any desired order immediately following the 
completion of the teaching of each unit. In general, Tests 1, 2, and 3 
comprise the usual coverage of the first semester and Tests 4, 5, 


13 Marion F. Shaycoft, Shaycoft Plane Geometry Test. Published by World Book 


Co., 1950. 
14 Harold B. Jeffery and others, Seattle Plane Geometry Test. Published by 


World Book Co., 1951. 


m 
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and 6 the units normally taught in the second semester. Several 
novel testing techniques are utilized. No one of the tests is designed 
to serve as a semester final or end-of-course final test. As in the case 
of the unit.tests.in algebra, the.advantage of these unit tests lies in 
the fact that the results become available to the pupil and the teacher 
before the completion of the entire course and in time for supple- 
mentary instruction to be undertaken if necessary. Separate quick- 
scorable answer sheets are provided for use with the 32-page test 
booklets. 

The summary of Table 36 indicates the content coverage of each 
of the six tests, the number of items, and the working time for each 
of the six tests. 


TABLE 36. Summary of parts, items, and timing: Lane-Greene Unit Tests 
in Plane Geometry 1° 


1 Number 
Test Title of items | Timing 
I Fundamental Ideas of Geometry 35 35 
2 Parallel Lines and Triangles 60 38 
3 Rectilinear Figures 60 38 
4 The Circle 45 38 
5 Proportion and Similar Polygons 45 36 
6 Areas of Polygons 45 36 


6 MEASUREMENT IN TRIGONOMETRY 


Instructional aims of trigonometry 


Measurement in trigonometry is concerned with two major ele- 
ments: (т) an understanding of the fundamental propositions of 
trigonometry and familiarity with the fundamental relations between 
the ratios; and (2) the ability to solve problems involving the ap- 
plications of such understandings, Trigonometry is usually offered 
as an elective subject in the junior or senior year of high school, and 


ordinarily is taken only by those pupils whose interests lie in the 
field of higher mathematics, 


15 Ruth О. Lane and Harry A. Greene, Lane-Greene Unit Tests in Plane 
Geometry. Published by Bureau of Educational Research and Service, State Uni- 
versity of Iowa, Iowa City, 1944. 
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Standardized testing in trigonometry 


The extremely limited number of standardized tests for trigo- 
nometry make use primarily of the multiple-choice form of item in 
various ones of the modifications used for algebra and geometry in 
the preceding sections of this chapter. Because of the similarity in 
item form, no examples for this subject will be given here. 


7 APTITUDE AND PROGNOSTIC TESTS IN MATHEMATICS 


Aptitude and prognostic tests 


The distinction between aptitude and pregnostic tests was brought 
out in Chapter 3 of this book. It is sufficient to say here, therefore, 
that the two types of tests are treated together in this section because 
of the similarity of purposes for which they are widely used. 


Prediction of success in first-year algebra 


Teachers and high-school supervisors have long been impressed 
with the excessive difficulty that pupils encounter in first-year 
algebra. These difficulties are shown in. two ways: (1) in the high 
percentage of pupil failure in the subject, and (2) in the large amount 
of extra help demanded by pupils outside of the classroom if failure 
is to be avoided. Because of the large amount of wasted time and 
effort, there is increasing interest in tests designed to predict pupil 
success in specific fields. In general, two somewhat different types of 
procedure have been tried as a basis for prognosis. These are (1) the 
learning technique, in which the aptitude of the pupil is measured in 
terms of the speed and accuracy with which he is able to acquire 
skills and information in the new field and respond to objective tests 
over the newly learned material, and (2) the inventory technique, 
in which he reveals his aptitude in terms of reactions to specific 
exercises sampling into underlying skills on which success in the 
subject depends. The Orleans Algebra Prognosis Test is of the former 
type, while the Jowa Algebra Aptitude Test, Revised, the Iowa Place- 
ment Examination, Mathematics Aptitude, the Lee Test of Algebraic 
Ability, and the California Algebra Aptitude Test are of the latter 


type. 
The Orleans Algebra Prognosis Test is an aptitude test of the learn- 
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ing type. It gives a basis for predicting the pupil's success in the 
subject by measuring the speed and accuracy with which he is able 
to learn novel material of the sort encountered in algebra. The test 
contains eleven simple lessons with a test on each covering funda- 
mental principles and essential skills in learning algebra. An arith- 
metic and a summary test are also included. 

For the Jowa Algebra Aptitude Test, Revised, four basic skills 
were selected from the large number of possible factors that were 
demonstrated to be highly related to achievement in first-year al- 
gebra. These skills, none of which involves algebraic ability as such, 
deal with (1) arithmetic computations, (2) computations involving 
abstract concepts, (3) manipulation of numerical series, and (4) 
solution of problems involving dependence and variation. Pupils 
who score below the twentieth percentile of the test norms are 
almost certain to fail the course, or must practically be carried 
through the course by the teacher if they succeed in passing. Such 
pupils probably should be diverted into courses in general or applied 
mathematics or into other fields of study where they are more likely 
to succeed. 


Prediction of success in plane geometry 


Prognostic tests in plane geometry are similar to those in algebra 
with respect to their major types. The Orleans Geometry Prognosis 
Test is similar to the algebra test by the same author in utilizing the 
learning procedure as the basis for prediction. The Jowa Plane 
Geometry Aptitude Test, Revised, and the Lee Test of Geometric 
Aptitude, as is true of their counterparts in algebra, approach the 
problem by means of the inventory method. The four skills chosen 
for measurement in the Zowa Plane Geometry Aptitude Test, Revised, 
for example, deal with (r) reading of geometry content, (2) alge- 
braic computations, (3) arithmetical and algebraic reasoning, and 
(4) visualization. The tests are useful guidance and supervisory tools 
in the same manner as was indicated above for aptitude and prog- 
nostic tests in first-year algebra. 
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Topics for Discussion 


т. In your opinion why does the content of high-school mathematics 
courses vary or change less than does the content of the sciences? 

2. Which of the 29 objectives in the check list in this chapter would 
be met adequately by a course in advanced arithmetic? By a course 
in general mathematics? 

3. Identify some of the more important of the specific skills in arith- 
metic that appear to lend themselves to measurement and remedial 
treatment. 

4. What in your judgment accounts for the fact that algebra and 
geometry have not been subjected to the same detailed analysis 
and extensive measurement as is true of arithmetic? 

s. Outline a plan by which you could give helpful guidance to high- 
school pupils concerning the type of mathematics they should elect. 

6. Summarize the comparative values of end-of-course tests and unit 
tests in the different fields of mathematics. 

7. To what extent do you find that there is justification for depend- 
ing upon transfer from algebra to the learning of geometry? 

8. Why are there few or no adequate diagnostic tests in algebra or 
geometry? 

9. Make a comparison of any two selected algebra or geometry tests 
on the complete lists of specific skills measured. 

to. Show how the basic steps in problem-solving in algebra and 
geometry closely parallel the steps in the thinking process. 

тт. Evaluate the two types of techniques that are used as the basis 
for prognosis in algebra and geometry. 

12. What procedures can you suggest for the improvement of initial 
instruction on problem-solving in algebra and geometry? 


Selected References 


Arithmetic in General Education. Sixteenth Yearbook of the National 
Council of Teachers of Mathematics. New York: Bureau of Publica- 
tions, Teachers College, Columbia University, 1941. 

Bresticu, E. R. “Contributions of Research to Special Methods: Sec- 
ondary-School Mathematics.” The Scientific M ovement in Education. 
Thirty-Seventh Yearbook of the National Society for the Study of 
Education, Part II. Bloomington, Ill: Public School Publishing Co., 
1938. p. 128-34. 

BROMLEY, ANN, AND CARTER, Geratp C. “Predictability of Success in 
Mathematics." Journal of Educational Research, 44:148-50; October 


1950. 


528 THE SECONDARY SCHOOL 


Воскіхснам, Guy E. “Diagnostic and Remedial Teaching in First 
Year Algebra." Journal of Educational Research, 30:198-213; No- 
vember 1936. 

Buros, Oscar K., editor. The Fourth Mental Measurements Yearbook. 
Highland Park, N. J.: Gryphon Press, 1953. p. 504-18. 

Bunos, Oscar K., editor. The Nineteen Forty Mental Measurements 
Vearbook. Highland: Park, N. ].: Mental Measurements Yearbook, 
1941. p. 281-302. 

Bunos, Oscar K., editor. The Nineteen Thirty Eight Mental Measure- 
ments Yearbook. New Brunswick, N. J.: Rutgers University Press, 
1939. p. 38-43. 

Buros, Oscar K., editor. The Third Mental Measurements Yearbook. 
New Brunswick, N. J.: Rutgers University Press, 1949. p. 419-38. 
Commission on Post-War Plans of the National Council of Teachers of 
Mathematics. Guidance Pamphlet in Mathematics for High School 
Students. Washington, D. C.: The Mathematics Teacher, November 

1947. 

Davis, RoBERT A., AND HENRICK, MARGUERITE. “Predicting Accom- 
plishment in Plane Geometry.” School Science and Mathematics, 
45:403-5; May 1945. 

HARTUNG, MAURICE L. “Evaluating Appreciation of the Cultural Values 
of Mathematics.” School Science and Mathematics, 37:168-81; Feb- 
ruary 1937. 

Hartung, Maurice L. “Some Problems in Evaluation.” Mathematics 
Teacher, 31:175-82; April 1938. 

HARTUNG, MAURICE L., AND Fawcett, HaRorp P. “The Measurement 
of Understanding in Secondary-School Mathematics." The Measure- 
ment of Understanding. Forty-Fifth Yearbook of the National So- 
ciety for the Study of Education, Part I. Chicago: University of 
Chicago Press, 1946. Chapter 8. 

HasriNos, J. Tuomas. “Testing Junior High School Mathematics Con- 
cepts.” School Review, 49:766-76; December 1941. 

Jorpan, A. M. Measurement in Education. New York: McGraw-Hill 
Book Co., Inc., 1953. Chapter 9. 

Томс, JoHN A., LUNDHOLM, HAROLD T., AND SurrH, EucENE R. “Ex- 
aminations in Mathematics.” The Construction and Use of Achieve- 
ment Examinations. Boston: Houghton Mifflin Co., 1936. Chapter 7. 

Mathematics іп General Education. Report of the Committee on the 
Function of Mathematics in General Education, Commission on Sec- 
ondary School Curriculum, Progressive Education Association. New 
York: D. Appleton-Century Co., Inc., 1940. Chapter 13. 

ScHAAF, М/пллАМ L, “Testing the Clarity of Mathematical Concepts.” 
School Science and Mathematics, 39:651-56; October 1939. 


MATHEMATICS 529 


SurLTZ, BEN A., BOYNTON, HOLMES, AND SAUBLE, IRENE. “The Meas- 
urement of Understanding in Elementary-School Mathematics.” The 
Measurement of Understanding. Forty-Fifth Yearbook of the Na- 
tional Society for the Study of Education, Part I. Chicago: University 
of Chicago Press, 1946. Chapter 7. 

TRAXLER, ARTHUR E. “A Testing P 
Secondary-School Level.” Mathematics Teacher, 39:303-13; 
ber 1946. 

Ursvik, Bjarne К. “An Attempt to Measure Critical Judgment." 
School Science and Mathematics, 49:445-52; June 1949. 

Wren, F. LYNWOOD. «Mathematics, Secondary.” Encyclopedia of Edu- 
cational Research. Revised edition. New York: Macmillan Co., 1950. 


p. 717-25. 


rogram for Mathematics at the 
Novem- 


20 


Measuring and Evaluating in the 
Sciences 


THIS CHAPTER presents a discussion of the following points involved 
in the measurement and improvement of instruction in the secondary- 
School sciences : 


Objectives of the sciences. 

Outcomes of the sciences. 

Measurement in the sciences. 

Standardized tests in the Sciences. 

Testing methods in the sciences. 

Informal objective testing of science outcomes. 


со? Ш> 


This chapter supplements Chapter 18 by furnishing a further 
discussion of the problems of measurement in the content subjects. 
It deals primarily with the science fields of general science, biology, 
physics, and chemistry. 

Adequate science instruction should be expected as a matter of 
course in an age of science like the present. It is rather surprising 
to note, however, that progress in the selection and organization of 
science content and improvement in teaching and testing methods 
and materials in recent years have been relatively slight, in spite of 
the great practical value of science and its natural appeal to the 
curiosity and interests of pupils. Science as an elementary-school 
Subject has made but few contributions of experience or enrichment 
to the progress of education, and the secondary-school science fields 
are not much more significant in this respect. Apparently most of 
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the adult population have learned to make their adjustments in this 
scientific age through experiences they have had oütside the school. 


1] SCOPE OF THE SCIENCES 


Objectives of the sciences 


An examination of numerous modern sources fails to disclose any 
new, clear-cut, and realistic statements of objectives for the ele- 
mentary sciences. Aims and objectives appear to be more satisfactory 
for science in general and for the secondary school, however. The 
following list for science in general includes eight types of objectives 
but in each instance provides only a few of the illustrations given to 
show how the objectives can be attained.* 


A. Functional information or facts about such matters as: 
т. Our universe—earth, sun, moon, stars, weather, and climate. 
2. Living things—plants and animals. 
3. The human body—structure, functions, and care. 
4. Energy—sources, types of energy, machines. 


B. Functional concepts, such as: 


1. Space is vast. 
2. The earth is very old. 
3. All life has evolved from simpler forms. 


C. Functional understanding of principles, such as: 


r. All living things reproduce their kind. 
2. Energy can be changed from one form to another. 
3. All matter is composed of single elements or combinations of 


elements. 


D. Instrumental skills, such as: 
1. Read science content with 
2. Perform fundamental opera 
3. Read maps, graphs, charts, 
4. Make accurate measurements, 


understanding and satisfaction. 
tions with reasonable accuracy. 
and tables and interpret them. 
readings, titrations, etc. 


E. Problem-solving skills, such as ability to: 


т. Sense a problem. 
2. Define the problem. 


1 Victor H. Noll, chairman, 
Education in American Schools, 
the Study of Education, Part I. University o 
Quoted by permission of the Society. 


“The Objectives of Science Instruction.” Science 
Forty-Sixth Yearbook of the National Society for 
f Chicago Press, Chicago, 1947. p. 28-29. 
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3. Select the most likely hypothesis. 
4. Test the hypothesis by experimental or other means. 
5. Draw conclusions. 


F. Attitudes, such as: 
т. Open-mindedness—willingness to consider new facts. 
2. Intellectual honesty— scientific integrity, unwillingness to com- 
promise with truth as known. 


G. Appreciations, such as: 


1. Appreciation of the contributions of scientists. 
2. Appreciation of basic cause-and-effect relationships. 


H. Interests, such as: 
I. Interest in some phase of science as a recreational activity or 
hobby. 
2. Interest in science as a field for a vocation. 


Noll * analyzed and classified science objectives from r3o sources 
and tabulated the frequency with which each type of objective was 
listed for the junior high school and senior high school in the various 
sources. The objectives are listed below by types and in order of 
frequency of listing at the senior-high-school level. 


A. Knowledges 


т. Knowledge of the principles and applications of science. 

2. Knowledge leading to an understanding of the nature and or- 

ganization of the environment. 

Preparation for further work in science and for college entrance. 

4. Exploration to acquaint the pupil with science and to help him 
to orient himself with respect to the different sciences. 


w 


B. Appreciations 


1. Appreciation of the beauties of nature and of the common- 
place. 


2. Appreciation of the work of scientists. 
C. Abilities 


т. Ability to use the scientific method. 
2. Ability to do useful tasks. 


D. Habits 


т. Desirable habits of work and study. 
2. Habits of healthful living. 


? Victor Н. Noll, The Teaching of Science in Elementary and Secondary Schools. 
Longmans, Green and Co., New York, 1939. p. 7-10. 
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E. Interests 


т. Interest in science. 
2. Interest in environment. 


F. Attitudes 
т. Scientific attitude. 


Although knowledge outcomes were ranked first in both lists, they 
received considerably more emphasis for the senior-high-school 
courses. Whereas habits and appreciations ranked second and third 
in the junior-high-school list, appreciations and abilities took second 
and third places in the senior-high-school list. The major differences 
were the increased emphasis upon knowledges and abilities and the 
reduced emphasis upon habits at the senior-high-school level. The 
emphasis upon the knowledge objectives was slightly greater than 
that upon the other five types of objectives combined. 


General outcomes of the sciences 


The sciences must be viewed from two rather specific points of 
view—for their immediate educational values for high-school pupils 
and for the background of preparation they afford for the later more 
intensive and specialized study of the sciences by those who continue 
into college. Educational values of real significance will be attained if 
pupils, as a result of such instruction, acquire (т) the ability to use 
the scientific findings that apply in their experiences, (2) the ability 
to interpret natural phenomena in their environments, and (3) an 
appreciation of scientific attitude through understanding of and 
ability to use some of the methods of study that have been employed 
by scientists.* 

The question of organization of courses arises here as it does in 
the social studies—whether the sciences should follow the traditional 
subject divisions or be integrated to produce a unified course of study. 
The tendency in the most progressive schools is toward unification. 
On the whole this movement has met with more general approval in 
the elementary grades than it has at the secondary-school or college 
levels, however. The senior-high-school sciences are likely to be quite 
specific and to involve sampling much more deeply into limited fields. 

5 S, Ralph Powers, “The Plan of the Public Schools and the Program of Science 


Teaching." A Program for Science Teaching, Thirty-First Yearbook of the National 
Society for the Study of Education, Part 1. Public School Publishing Co., Blooming- 


ton, Ш., 1932. р. 10. 
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Regardless of the desirability of developing an integrated course of 
study, most of the secondary-school science now taught is presented 
in the form of separate courses in general science, biology, physics, 
and chemistry. The scope of each of these is discussed here without 
reference to their possible integration. 

General science. Most intelligent adjustments, as distinguished 
from those that are purely accidental, impulsive, or habitual, are 
dependent upon scientific procedures. Everyone is called upon to 
make such responses in connection with his home, his neighborhood, 
his vocation, his civic duties, and his leisure. He is frequently con- 
fronted with a need for some special knowledge of health control, 
mechanics, chemistry, physics, biology, or plant and animal life. 
At most every hour of the day the individual is in the midst of the 
influence of mechanical and scientific appliances. For their operation, 
maintenance, adjustment, and repair, and as a protection from their 
dangers, he needs information and first-hand experience of the type 
obtained in general science. 

Biology. Typically following general science in the high-school 
course of study, biology is concerned with the physical and mental 
health and the environments of pupils. In so doing, it emphasizes the 
maintenance of life, interrelationships among forms of life, de- 
pendence of man on his physical environment, and man's ability to 
control his environment. Some of the direct concerns of this course 
typically are health education, sanitation, physiology, sex education, 
first aid, the mechanics of heredity, cultivation of plants, behavior 
and control of animals, and conservation of natural resources. The 
course in biology serves both as a terminal science course for many 
pupils and as a foundation for physics, chemistry, zoology, and 
botany for other pupils. 

Physics and chemistry. These two advanced and relatively spe- 
cialized courses, in contrast with biology, deal much more with in- 
organic than organic matter at the high-school level. Correlation 
of physics and chemistry with life is important, and preparation for 
advanced study at the college level is also a necessary concern. 
Physics deals with the transformation and conservation of energy, 
with heat, light, mechanics, and electronics, and with fundamentals 
of atomic energy. Chemistry treats the elements, their valences, and 
the periodic table, chemical compounds and their formation, chem- 
ical symbolism, and equations and their balancing. Both physics and 
chemistry stress certain laboratory and mathematical competencies. 
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Basic principles and generalizations of science 


To supplement these somewhat general objectives of science in- 
struction, a list is given below to identify some of the larger and 
more important scientific concepts. The Committee on the Teaching 
of Science suggested the following principles and generalizations 


which it considered of value for guidance in selecting and organizing 
content for science teaching: * 


т. The sun is the chief source of energy for the earth. 
2. Through interdependence of species and the struggle for existence 
a balance tends to be maintained among the many forms of life. 
3. The earth's position in relation to the sun and moon is a determin- 
ing factor of life on earth. 
4. All life comes from life and produces its own kind of living or- 
ganism. 
5. Matter and energy cannot be created or destroyed, but may be 
changed from one form to another. 
6. Species have survived because of adaptations and adjustments 
which have fitted them to the conditions under which they live. 
7. The energy of solar radiation is continually working changes in 
the surface of the earth. 
8. There have been profound changes in the climate, not only of cer- 
tain regions, but also of the earth as а whole. 
9. The evolution of the earth has come as a result of natural forces. 
10. Units of time are defined by the earth's movements in relation to 
the sun. 
тт. All life has evolved from simple forms. 
12. The earth seems very old when its age is measured in the ordinary 
units of time. 
13. Distances in space seem extremely vast when compared with dis- 
tances on earth. 
14. The physical environment has great influence on the structural 
forms of life and on plant and animal habitats. 
15. Man can modify the nature of plant and animal forms through 
application of his knowledge of the laws of heredity. 
16. There is a great variety in the size, structure, and habits of living 
things. 
17. There are processes that go on within an organism that are vital to 
its continued existence. 


1S, Ralph Powers, chairman, “The Objectives of Science Teaching in Relation 
to the Aim of Education." А Program for Science Teaching, Thirty-First Yearbook 
of the National Society for the Study of Education, Part I. Public School Publish- 


ing Co., Bloomington, Ill, 1932. P- 53-55- 
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Chemical and physical changes are manifestations of energy 
changes. 

There are fewer than one hundred chemical elements. 

Every substance is one of the following: (a) a chemical element, 
(b) a chemical compound, (c) a mechanical mixture. 

Certain material substances and certain physical conditions are 
limiting factors to life. 


. Light is a limiting factor to life. 


Sound is caused by waves which are produced by a vibrating body 
and which can affect the auditory nerves of the ear. 

Gravitation is the attractive force that influences or governs the 
movements of astronomical bodies. 

Machines are devices for accomplishing useful transformations of 
energy. 


. Any machine, no matter how complicated, may be analyzed into a 


few simple types. 
The properties of the different elements depend on the number and 
arrangement of the electrons and protons contained in their atoms. 


. All matter is probably electrical in structure. 
- The applications of electricity and magnetism in the home and in 


industry have revolutionized the methods of living of many people. 


. Heredity determines the differences between parents and offspring 


as well as the resemblances. 


. The kinetic energy of the molecules determines the physical states 


of matter. 
The gravitational attraction between the earth and a mass of un- 


confined gas or liquid causes the pressure of the liquid or gas on 
the surface of the earth. 


. Liquid or gas pressure is exerted equally in all directions. 


Chemical changes are accompanied by energy changes. 


‚ А change in rate or direction of motion of an object requires the 


application of an external force. 

Radiant energy travels in straight lines through a uniform medium. 
Electricity is a form of energy that results from disturbing the 
position or the regular paths of electrons. 

In a chemical change a quantitative relationship exists between the 
amounts of substances reacting and the amounts of the substances 
that are the products of the reaction. 


This list of principles and generalizations gives an idea of the 
breadth and depth of the field and also a clue to the types of ob- 
jectives to be sought in science instruction. 
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2 MEASUREMENT IN THE SCIENCES 


Difficulties in constructing science tests 


'The construction of science tests should apparently be relatively 
simple, since the content of science is quite tangible. However, diffi- 
culties of a degree no less marked than in the other content subjects 
are encountered. There is much the same lack of agreement on the 
content of the course of study and its organization that is found in the 
social sciences. Controversies about the importance of facts as con- 
trasted with emphasis on relationships and problem-solving are still 
somewhat in evidence in science teaching, although science teachers 
have increasingly of late given attention to the more intangible out- 
comes of instruction. There is very little objective evidence on what 
particular skills and principles, or what elements in and safeguards 
to scientific thinking, are of most importance or can best be imparted 
in the secondary-school sciences. The typical science course appar- 
ently attempts to accomplish little more than to give a knowledge 
of the names of a few of the common animals, plants, and physical 
objects, and an acquaintance with a few of the simpler natural 
phenomena, without any very definite purpose appearing to justify 
the accumulation of such information. 

Real evidences of accomplishment in the sciences are to be found 
in the development and (ле direction of pupils’ interests, attitudes, 
appreciations, skills, habits, and actions in these fields. The ideal 
way to determine the changes that are effected in the pupil as a 
result of studying a unit in science would be to measure the incre- 
ment of desirable activities that he can and does perform as a result 
of this study. However, only a few attempts to devise tests for such 
a purpose have so far been made. 


Measurable outcomes of science 
Five major types of measurable qualities are designated in the 
sciences. . 
Knowledges. Most tests in science tend to overemphasize informa- 
tion and knowledge as the goal of study. It is too often assumed that 
knowledge is a positive index of satisfactory modes of adjustment. 
This assumption, of course, is only partially defensible. Merely to 
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know is no assurance of subsequent proper reaction. But insofar as 
knowledge is essential to adjustment its proper worth should not be 
discounted. Accordingly, measurement of the pupil's knowledge of 
scientific facts is to that extent valid and defensible. 

Skills. Laboratory and other science skills are directly involved 
in the secondary-school sciences, and the degree to which pupils attain 
the desired skills can be measured readily. Performance rather than 
paper-and-pencil tests are often demanded in such situations. Since 
performance tests are treated in Chapter 8, the measurement of skill 
outcomes in the sciences receives little attention in this chapter. 

Concepts and understandings. Facts in science are the vehicles for 
thought. The understanding of the relationships of facts and of 
generalized ideas is deemed most important. It is these generalized 
ideas that pupils should attain in their study of science. Tests should, 
therefore, so far as possible, measure the relational aspects of science, 
and do succeed in this aim to a reasonable degree. 

Applications. Problem-solving tests in Science call for the applica- 
tion of knowledge and may demand one or more types of scientific 
thinking. Similarly, test items that involve the interpretation of new 
situations demand more than mere recall and, thus, are measures of 
ability to use scientific knowledge or judgment. Such test items 
Should find a more extensive place in testing procedures than they 
have thus far been given. 

Attitudes and interests. Some progress has been made in the 
measurement of pupils’ attitudes toward and interests in science 


defined very clearly. Furthermore, it is not too clear how pupils 
should be tested, or what should be the content of the test that will 


dynamic sense. Some significant attempts have been made in the 
measurement of scientific attitudes, however. 


3 STANDARDIZED TESTS IN THE SCIENCES 


Standardized tests in Course areas 


There is.considerable variety among the standardized tests in the 
secondary-school sciences, Although the majority of tests are for 
general science, biology, physics, and chemistry, several tests for the 
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junior high school and at least one or two for the senior high school 
cover the science field broadly and without reference to particular 
courses. Most of the tests are of the achievement type, although 
instructional tests and aptitude tests are found for some of the science 
subjects. | 

General Science. Significant development of a unified course-in 
science continues in Grades 7, 8, and 9. Tests are available for the 
general science course typically given in the ninth grade. Recently 
revised test batteries, such as the Stanford Achievement Examina- 
tions and the Metropolitan Achievement Tests, have general science 
or natural science sections suitable for use in junior-high-school 
classes. The Van Wagenen General Science Reading Scales, while 
rather old, are still quite useful for the measurement of ability to 
read science material of varying degrees of comprehension difficulty. 
The Read General Science Test is representative of end-of-course 
tests in this area. It includes 75 items on light, sound, heat, mechanics, 
electricity, chemistry, weather, astronomy, nutrition, genetics, disease 
and health, conservation, and geology. The Glenn-Gruenberg Instruc- 
tional Tests in General Science serve very well for diagnostic and 
inventory purposes, especially in the older types of course organiza- 
tion. о, 

Biology. The majority of objective tests in the field of biology 
deal almost entirely with the measurement of facts, information, and 
principles. The less tangible outcomes, such as interests and ap- 
preciations, have consistently avoided identification and measure- 
ment, The Nelson Biology Test is an attempt to validate an end-of- 
course test measuring knowledge and understanding of facts, con- 
cepts, and principles; ability to identify cause and effect relation- 
ships; and ability to apply what is learned to lifelike situations. The 
75 items comprising this test are based on criteria presented in the 
Thirty-First, Thirty-Fourth, and Forty-Sixth Yearbooks of the 
National Society for the Study of Education. 

Physics, It is not difficult to measure the factual and informa- 
tional aspects of physics. The understanding of the relationships of 
facts or of the generalized idea, which is one of the main objectives. 
of good science teaching, is also capable of measurement, but this is 
much more difficult than merely testing for memory of facts. Prob- 
lem-solving abilities, valuable in that they require the pupil to do 
reflective thinking, are also measurable. These two further qualities, 
basic to effective work in physics, also lend themselves quite readily 
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to measurement of two types: (1) the ability to read scientific mate- 
rial, and (2) the mastery of underlying mathematical skills. 
Chemistry. Many of the implied outcomes of instruction in chem- 
istry cannot be measured accurately and objectively. It is possible, 
however, to measure with reasonable accuracy such elements as (1) 
factual matter or informational items of the course, (2) specific 
techniques, such as determining valence relationships and balancing 
equations, (3) ability to solve chemical problems, (4) laboratory 
procedures, and (5) certain aspects of scientific attitudes. Tests of 
achievement predominate in this field. Representative of tests in this 
field is the Anderson Chemistry Test, standardized for use in first 
courses in high-school chemistry. This is an end-of-course test of 80 
items measuring (1) factual information, (2) understanding of prin- 
ciples and their practical applications, (3) the elements of the scien- 
tific method together with its associated attitudes in chemical situ- 
ations, and (4) ability to use formulas and to solve problems. 


Standardized test methods 


Sample items illustrative of the manner in which various objective 
item forms are used in secondary-school science testing are presented 
here. The student should utilize the sample items together with the 
bibliography at the end of the chapter for information concerning 
standardized tests as well as for suggestions on types of informal 
objective items suitable for use in the high-school sciences. 

Simple recall and completion items. Simple recall and completion 
items typically differ only slightly in form and not at all in the nature 
of the pupil’s response. The following sample shows the manner in 
which simple recall items can be used with pictorial representation. 


Sample A.5 


This diagram illustrates the facts of 
themes ы oru of light. a 


The amount of apparent bending of 
the stick depends upon the ...... 
ЕЕЕ т of the liquid. b 


^ Giles M. Ruch and Herbert Е. Popenoe Ruch-Popenoe General Science Test. 
Published by World Book Co., 1923. A 4 ШУ KS 
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True-false items. The following sample of true-false items illus- 
trates one of the few applications of this item type in high-school 
science tests. 


Sample B.’ 


( ) 42. The degree of ionization of a substance depends upon the 
concentration. 


( ) 43. A chlorine ion has fewer electrons than a chlorine atom. 
( ) 44. The majority of the carbon compounds are highly ionized. 


( ) 4s. The United States has the most important sulphur deposits 
known. 


Multiple-choice items. By far the most popular item form in high- 
school science tests, the multiple-choice type, is used in several 
different adaptations. Samples D to F show sample items of the 
common type, an item based on diagrams, and an item based on a 
passage to be read. 


Sample .C.* 


23. The sun appears to rise earlier than it actually does. This is due to 
the earth's atmosphere causing the light waves to be— 


diffused. 
reflected. 
dispersed. 
polarized. 
refracted. 


Quo eb EE 


24. The phenomenon that best supports the hypothesis that light is a 
form of transverse wave motion is called— 


6. polarization. 
7. refraction. 
8. interference. 
9. reflection. 
IO. dispersion. 


6 Ernest Kirkpatrick, Kirkpatrick Chemistry Test, Test II. Published by Bureau 
of Educational Measurements, State Teachers College, Emporia, Kan., 1940. 
* Gordon M. Dunning, Dunning Physics Test. Published by World Book Co., 


1950. 
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Sample D.* 


49. If a 500-pound weight is placed at the arrow, which lever 
will lift the 60-pound weight W the highest? 


1. 3 1 

2. 2 1 

з 1.1. I 

4. 1 2 

5. 1 3 
Sample E.? 


Water takes up oxygen from the air in varying amounts. Cold water 
will take up small quantities of oxygen while warm water takes up almost 
none. Running water will dissolve (that is, take up) more oxygen than 
standing water. Water in which plants are growing contains much oxygen 
because the green plants give off oxygen in the process of photosynthesis. 
When there is not enough light for plants to manufacture food, they do 
not give off oxygen but consume it in respiration. Water animals also use 
oxygen in respiration so that the amount of oxygen found in water is 
always changing. The oxygen content of an aquarium changes from day 
to day and from hour to hour and is different even at different levels in 
the aquarium. 


13. Standing water takes up 
13-1 more oxygen than running water. 
13-2 as much oxygen as running water. 
13-3 less oxygen than running water. 
13-4 a great deal of oxygen. 
1j-5 DO охуве ААИ ИТҮЕ т. BC ) 


Matching exercises. Two samples of the matching test are given 
below. The first, illustrating an identification test, requires the match- 


8 John G. Read, Read General Science Test. Published by World Book Co., 1950. 
? John G. Zimmerman and Richard E. Watson, Cooperative Science Test for 
Grades 7, 8, and 9, Form К. Published by Cooperative Test Service, 1941. 
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ing of parts of the digestive tract and their pictorial representation, 
and the second is a matching unit based on diagrams and having some 
elements in common with the multiple-choice form. 


Sample Е.2° 


In this diagram of the digestive tract: 


a The small intestine is lettered .... a 
b The esophagus is lettered mur We 
c The liver is lettered Мул 
d The stomach is lettered P d 
€ The pancreas is lettered a 7, 


Sample G." 


1 A leucocyte 4. Drepresents . .( ) 
2 An ovum 

3 A nerve cell 5. Е гергеѕепіѕ . .(. ) 
4 A digestive epithelium cell 

5 A section of cartilage 6. F represents . .( ) 


19 Ruch and Popenoe, op. cit. í д 
11 F, L. Fitzpatrick, Cooperative Biology Test, Form Q. Published һу Cooperative 


Test Service, 1940. hit 
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4 INFORMAL OBJECTIVE TESTING IN THE SCIENCES 


Objective items of the types illustrated above have been used 
'quite widely by informal objective test makers in the evaluation 
of the more intangible outcomes of science instruction.'? Further- 
more, rather complex adaptations of the common item forms have 
also been used. There seems excellent reason to believe that much 
of the most significant recent testing in the science field has been 
done by informal objective testing methods. Space does not permit" 
many illustrations of this type ot approach to the measurement of 
scientific knowledges and abilities here, but the several illustrations 
below are supplemented by some of the evaluative and interpretive 
tests for secondary-school science illustrated in Chapter 9. 


Measurement of broad instructional outcomes 


An informal, semi-objective test for teaching more than for testing 
purposes was devised by Davis 1° for use in eighth- or ninth-grade 
science courses in the measurement of other than largely factual 
instructional outcomes. The following reproduction of the instruc- 
tions to pupils and of the first paragraph of the selection to be read 
and evaluated by the pupils will serve to show the nature of the 
instrument. 

To THE PUPIL 


Here is a test which I think you will find quite different from any you 
have ever taken. It is a story about Johnny Jones. He was quite an 
active boy, but sometimes he was a poor scientist. Some of his friends 
and the members of his family may not have been good scientists either. 
Whenever you find something in the story which does not agree with 
what you think good science means, put a pair of parentheses ( ) around 
the sentence or part of a sentence where you find this. Next, at the 


border of the paper beside the error, write in the correct letter from the 
following list: 


12 For example, see Louis M. Heil and others, “The Measurement of Understand- 
ing in Science.” The Measurement of Understanding, Forty-Fifth Yearbook of the 
National Society for the Study of Education, Part I. University of Chicago Press, 
Chicago, 1946.-Chapter 6. 

13 Warren M. Davis, “A Science Test Designed To Teach and Measure Outcomes 


Other Than Memorization of Factual Information." Science Education, 23:371-723 
December 1939. 
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S means that Johnny or some one else was superstitious. 
D means that what was being done or had been done was dangerous. 
О means that statements are being taken or have been taken.fór truth 
without any proof being offered. TANT 
J means something unscientific for reasons other than S, D or О. If you 
use the letter J be prepared to tell the class what was wrong with the 
story at the point where you use this letter. 
Now go on with the story. 
Jounny’s Day 
Johnny Jones woke from a sound sleep one morning and 
noticed that the sun was already shining in his window. With- 
out looking where he was going he jumped to the floor and 
started gathering up his articles of clothing to put them on. 
Suddenly he stopped and said, “Shucks, it’s Saturday, no need 
for me to hurry. But it might just as well be a school day,” he 
went on as he looked out of the window, "it's sure to rain today. 
Old man Smith said this was a wet moon." 


The remaining parts of the selection, running to perhaps 1100 
words, included many additional evidences of behavior or reasoning 
illustrative of the types of situations covered by the S, D, О, and J 
methods of marking the selection. One point of credit was assigned 
for each pair of parentheses placed approximately in the correct 
position and an additional point of credit was assigned for each 
pair of parentheses accompanied by the proper identifying letter in 
this semi-objective test. 

The following multiple-pattern example of an informal objective 
approach to the measurement of the ability to apply principles, 
adapted from tests prepared by the Evaluation Staff of the Pro- 
gressive Education Association, Commission on the Relation of 
School and College, is representative of a testing approach coming 
into wide use in the fields of secondary school science.'* 


When an egg is cooked in an open kettle of boiling water on a high 
mountain, the cooking time is 


a. greater than the cooking time at sea level ............ (Gama 
b. less than the cooking time at sea level .............. (OE: 
c. the same as the cooking time at sea level ............ (ея 


14 Science in General Education. Report of the Committee on the Function of 
Science in General Education, Commission on Secondary School Curriculum, Pro- 
gressive Education Association. D. Appleton-Century Co., Inc, New York, 1938. 


р. 416-17. 
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Check the statements below which give the reasons or reason for your 
explanation above. 


d. Water boils at the same temperature everywhere ...... ( JE 


ге. Just as automobile radiators boil more frequently and 
quickly at high altitudes, so eggs will cook more quickly 
опа high mountain dA eee pete a eet POUR e rna ( )e. 


f. A reduction in the boiling point accompanies a reduction 
in the pressure above the water .................... ( Pk 


g. A reduction in the cooking temperature calls for an in- 
crease in cooking time or an increase in cooking tempera- 
ture calls for a reduction in cooking time ............ (C ) 


ga 


h. A reduction in air pressure accompanies an increase in 
altitude 


i. The boiling point of the water rises as the pressure above 
tHewwater. becomes Песен уа UU Filo ec sf. (e UE 


j. Decreased air pressure on mountain tops decreases the 
efficiency of fires for cooking purposes 6 XJ 


Measurement of scientific attitude 


Noll listed the following six abilities as essential to the scientific 
attitude: (т) accuracy in all operations—calculation, observation, 
and report, (2) intellectual honesty, (3) open-mindedness, (4) the 
habit of looking for natural causes, (5) the habit of suspended judg- 
ment, and (6) the habit of criticism.!5 Although he admitted that 
other habits might be included in such a list, he stated that a 
person who met all of the conditions listed above would possess the 
scientific attitude and would also be highly unique even in this 
scientific era. 

Suggestions concerning how each of these six essentials of scientific 
attitude can be measured informally were also presented by Noll.:* 
Some of his illustrations are reproduced to show techniques useful 
in measuring scientific attitude. 

(1) Accuracy in calculation—arithmetic examples. 
Accuracy of observation and report—questioning a pupil concerning 
the characteristics of an animal picture, plant, or diagram. 


'.35 Noll, op. cit. p. 23-26. 
16 Ibid. p. 34-37. 


SCIENCES 547 
(2) Intellectual honesty. 


T F When a pupil makes a poor mark in an examination it is 
usually because he is not well or he was up late the night 
before. 


T F It is perfectly justifiable not to pay one's fare on a bus or 
street car if the conductor doesn't come around to collect it. 
(3) Open-mindedness. 
T Е АП Indians are dirty. 
T F College professors as a rule would be failures in any line of 
work but teaching. 
(4) Cause and effect relationships. 
T F Finding a horseshoe means that one will have good luck. 


T F Giraffes have such long necks because through many gen- 
erations they have been stretched a little longer each time. 


(5) Suspended judgment. 
T F My neighbor is away from home most of the time. He 
must be a traveling salesman. 


T F Mr. Jones bought a new car last week. He must have had 
an increase in salary. 


(6) Criticism. 
T F Onecan always accept as true what is printed in a book. 


T F If my science teacher says a thing is so, it must be so. 


Another approach to the measurement of similar types of out- 
comes is that of Davis, who presented the directions to pupils 
and a few sample items from a test for measuring knowledge of 
cause and effect relationships.'" Students were asked to indicate by 
the use of the appropriate letter of the following 


A—If the first occurrence is practically the sole cause of the second. 

B— If the first occurrence is one of a number of the important con- 
tributing causes of the second. 

C—If the first occurrence contributes only slightly to the second. 


17 Ira C, Davis, "The Measurement of Scientific Attitudes.” Science Education, 
19:117-22; October 1935. 
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D—If both occurrences are results of the same general cause or causes. 
E—If the first occurrence bears no causal relationship to the second. 


their reactions to such items as these 


The sun shines on the earth; the earth is warm. 

А boy often picked up toads; the boy had warts on his hands. 

The light of lightning; the accompanying thunder. 

The ignition switch of an auto is turned on; the motor starts running. 
A rising column of air was cooled; a cloud formed. 


л ROO юн 


Davis also gave similar illustrations from a test designed to 
measure ability to distinguish between fact and theory.*® The appro- 
priate letter from this list 


A—Some are statements of well established facts which are always true. 
: B—Others may be statements of well established theories which are gen- 
erally accepted. 
C—Others may be statements of theories which are questioned by some 
(many) authorities. 
D—Others may be statements of popular beliefs which are not supported 
by evidence. 


was to be used in responding to each of these statements 


A disease is a punishment for some particular moral wrong. 
Air is composed of molecules. 

The pressure in water varies with the depth. 

Heating the molecules in air increases their speed. 

A high forehead indicates high intelligence. 


Uu 4 OS ы м 


Measurement of superstitious beliefs 


Zapf presented a technique for measuring the manner in which 
pupils actually behave in situations to which well-known supersti- 
tious beliefs apply.'? Pupils were placed in a closed room, where they 
opened boxes in which were found directions for their subsequent 
action asking that they go contrary to widely held superstitious 
beliefs. The extent to which they performed the actions was taken 
as an indication of the degree to which they were not governed in 


18 Ibid. 


19 Rosalind M. Zapf, “Superstitious Beliefs.” School Science and Mathematics, 
39:54-62; January 1939. 
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their behavior by these beliefs. Such situations as breaking a mirror, 
walking under a ladder, and opening an umbrella indoors were 
among the twelve used in the test. Although all thirty-two pupils 
tested in these situations had previously indicated that they did 
not believe in the superstitions, only two pupils went contrary to all 
twelve superstitions and two pupils acted superstitiously in five of 
the twelve situations. 


5 APTITUDE TESTING IN THE SCIENCES 


As was pointed out in preceding chapters for aptitude tests in 
English, foreign languages, and mathematics, aptitude tests in the 
science fields are used primarily for senior pupils in high school or 
for college freshmen in obtaining a basis for sectioning of classes and 
various types of educational guidance. These aptitude tests pre- 
suppose little or no study of the subject, but attempt to measure 
those specific intellectual abilities and those skills resulting from 
general training which make for success or failure in college courses 
in the subject. 

Aptitude tests in the science field are available in the Jowa 
Placement Examinations for chemistry and physics. Various parts 
of these tests deal with such aptitudes and abilities as the simple 
arithmetic and algebra basic to the subject, comprehension of para- 
graphs of scientific material, deductive reasoning with scientific 
materials, and understanding of commonly-used terminology of the 


Science. 


6 DIAGNOSIS AND REMEDIATION IN THE SCIENCES 


Limitations of diagnostic and remedial techniques in science 


Diagnostic procedures and remedial work in the field of science 
instruction are not highly developed. While certain of the available 
tests may show pupils to be deficient in some specific phase of 
science information, the majority of such tests do not point out the 
causes of the deficiencies. Practically all that can be done by way 
of diagnosis is in connection with certain skills that appear to be 


basic to the study of science. 
'The study of science involves the comprehension of a language 
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peculiar to the subject. Reading of scientific content is apt to be 
difficult. Thus, poor reading ability may form the basis of poor 
accomplishment in the subject. Diagnosis of reading abilities of the 
work-study type, accompanied by remedial instruction designed to 
overcome the weaknesses revealed, is one of the prerequisites to 
satisfactory progress in the study of the sciences. Laboratory work 
may call for many new abilities and techniques. 


Future of diagnosis in science 


There is considerable promise for the future of diagnosis and 
remediation in the sciences through further development of the 
evaluation techniques illustrated in the preceding section of this 
chapter. The attempt so far has been more upon the construction of 
valid evaluation procedures for the less tangible outcomes of instruc- 
tion than upon diagnostic values of the techniques. The writers 
believe, however, that constructive diagnostic and remedial pro- 
cedures may well grow out of this new approach to the measurement 
of ability in the sciences. 


Topics for Discussion 


т. Why is objective measurement in the sciences not highly developed? 

2. Enumerate and evaluate the aims of science education. 

3. In your opinion, is the need for a unified course in science in the 
high school any less serious than it is in the social studies? 

4. What are the most important measurable outcomes of instruction in 
Science? 

5. Examine the possibilities of measuring the major outcomes in science 
and specify a number of techniques for each. Would such a list 
parallel the types found in the social studies? 

6. In your opinion, what is the relation between factual knowledge and 
problem-solving ability in the sciences? 

7. Suggest some of the objective item types useful in science testing and 
illustrate them with items of your construction. 

8. Discuss and evaluate the informal objective test approaches to the 
measurement of some of the more intangible outcomes of science 
instruction. 


— a 
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Measuring and Evaluating 
in the Fine Arts 


Tux rorrowiNc possibilities of measurement of aptitudes and 
achievement in the fine arts are discussed in this chapter: 


A. Social and educational significance of the arts. 
Educational emphasis on music and art. 

Basic elements of musical talent. 
Measurement of musical accomplishment. 
Measurement of art appreciation. 
Measurement of artistic ability. 


мава 


The objective measurement of aptitudes and achievement in the 
fine arts is a relatively recent accomplishment—so recent, in fact, 
that there is still an echo of protest from a small group of artists that 
artistic production does not lend itself to objective evaluation. In 
spite of this feeling, however, much progress has been made in these 
fields of measurement. This is as it should be, for certainly in these 
cultural subjects is to be found much of the best that the educational 
program affords. With the trend of recent years in the direction of 
greater individual leisure for the cultural pursuits, the need for a 
better understanding of the content, aims, and methodology of these 
artistic subjects is greater than ever before. 

There is perhaps a certain advantage in the fact that develop- 
ments in the measurement of the fine arts have taken place somewhat 
slowly. In general, research techniques have improved, with the net 
result that the problems of measurement in these fields have been 

553 
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more critically analyzed and attacked with more refined instruments, 
The careful research of such critical workers as Carl Seashore, 
Schoen, Stanton, Kwalwasser, and Dykema—to name only a few in 
the psychology and pedagogy of music—and the work of Thorndike, 
Ayer, Meier, Manuel, Winslow, and Whitford—as an incomplete 
sampling of important names in the field of art—are evidences of 
the influence of this scientific point of view. 


1 MEASURABLE QUALITIES IN MUSIC 


Musical talent and achievement 


Measurement in music takes two major lines of approach. The 
first is the determination of basic aptitudes. Here, as in other sub- 
jects, the techniques and instruments used are psychological. Such 
instruments have been mentioned previously in this volume as tests 
of specialized intelligence, since they have to do with the determina- 
tion of tendencies to respond in certain ways to specific types of 
musical stimuli. Accomplishment in music depends to such a large 
degree upon the existence of aptitude that this phase of measurement 
must be given primary emphasis. The mere existence of aptitude in 
music is in no sense an index to musical accomplishment, however. 
The second approach to the problem is pedagogical and is based upon 
the use of achievement tests for the threefold purpose of measuring 
the knowledges, skills, and appreciative aspects acquired as a result 
of training. As Kwalwasser 1 pointed out: “Regardless of the talent 
possessed, one must have the will to succeed or little is attained ... 
There are a vast number of reasons why an individual of superior 


endowment may realize but а very small return on his native 
musicianship." 


Major aims and outcomes of music education 


The statement of aims and outcomes of music education that ap- 
peared in the 1921 report of the Educational Council of the Music 
Supervisors National Conference? has not been greatly improved 


1 ]асоЬ Kwalwasser, Tests and Measurements in Music. C. C. Birchard Co., 
Boston, 1927, 


2 Report of Educational Council of the Music Supervisors’ National Conference. 
National Education Association, Washington, D. C., 1921. 
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upon since that time. The statements of instructional outcomes and 
the recommended standard course of study in music performed two 
very admirable functions: (т) they provided tangible goals for 
teachers and supervisors, and (2) they provided defensible criteria 
for the validation of tests of musical accomplishment. The student 
in this field will do well to investigate this report. 

A more recent and very useful statement of the major goals of 
music education is that presented by Brooks and Brown.’ It is be- 
lieved that this general summary of elementary-school music instruc- 
tional goals also affords a very useful basis for the validation of 
improved tests in the fields of music information, accomplishment, 
and aptitudes at the secondary-school level. Fifteen of these major 
goals are reproduced here and are summarized with minor modifi- 
cations under the following seven practical categories: 


r. In Song Singing 
Ability to . . . use the voice to express and convey musical meaning 
in free, spontaneous, and beautiful song singing and with artistic 
interpretation. 


2. In Chorus 
Ability and disposition to associate with others . . . in joint render- 
ing of music in chorus singing.... 


3. In Appreciation and Its Background 

a. Discrimination and taste in music with evidence of preference for 
that which has excellence and worth. 

b. Sensitiveness to ordered perfection of structure and design in 
music both in song and in instrumental compositions and realiza- 
tion of aesthetic satisfaction in the beauty, appropriateness, and 
adequacy thus seen and expressed. 

c. Integrated volitional structure in personality with reference to 
selection in music, which leads to the choice and use of music 
which has high excellence in contrast to that which is inferior 
in quality.... 

d. Understanding of some phases of the development of music and 
some insight into the essential nature and meaning of music and 
the forces and influences that have produced it—including knowl- 
edge about musicians, familiarity with compositions, acquaintance 
with instruments and how they developed—to an extent and on 
a level appropriate to children of that maturity. . . . 


з Marian Brooks and Harry A. Brown, Music Education in the Elementary 
School. American Book Co., New York, 1946. p. 114-16. 
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4. In Instrumental Music 


a. 


b. 


Ability to use instruments as a means of musical expression and 
with satisfaction in such experience. 

Ability to handle with manipulative skill such musical instru- 
ments as are used. 


5. In Creative Music 


Ability to use individual originality and personal initiative in 


interpreting, using, and creating music. 


6. In Connection with the Musical Score 


a. 
b. 


C. 


Ability to read musical meaning fluently from the printed score. 
Ability to use musical notation to express or record musical 
meaning. 

Understanding of selected phases of the theory of music to an 
extent and on a level appropriate to pupils of that maturity, as 
essentially a functional approach to music literature and as a 
means toward a broader interpretation, including such elements 
of musical structure as accent, measure, phrase and period, scale 
and chord building, lines and Spaces, key signatures. 


7. In Connection with More Than One Pháse of Music Education 


a. 


Complete freedom from inhibitions arising from focal attending 
to mechanical processes, accomplished by the development of an 
habitual-response pattern that releases conscious attention from 
the mechanics and structural details and permits complete ab- 
sorption in getting or expressing meaning particularly (т) in sing- 
ing, absence of conscious attention to the manner and the acts 
involved in utterance, (2) in interpreting the musical score in 
reading music, freedom from focal consciousness of the structural 
elements involved in the Symbol perception necessary in gaining 
musical meaning, (3) in instrumental music, absence of focal 
attention to the finger manipulations and other physical move- 
ments connected with handling and managing the instrument. 
Ability to sense and feel the movement resident in music and to 
express it in bodily motion in some appropriate manner. ... 
Growth toward possession of music аз a social institution on a 
child's level of comprehension and participation; manifested by 
(т) children thinking and feeling together in groups and co- 
operating in collective undertakings, such as taking part in group 
activities in music, (2) awareness of Social units in the case of a 
number of children associated for а single purpose, (3) willing- 
ness to do one's part in the unison expression of common emo- 
tions, (4) ability to enter whole-heartedly and sincerely into 
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joint enterprises intended for the good of all members, (5) 
a feeling of common understanding and congeniality when a num- 
ber of children are united in shared endeavor in pursuit of a goal 
which all have accepted, and (6) the inclination to subordinate 
one's self as an individual and to accept the role of follower 
when that contributes most to the welfare of the greatest number 
of individuals. 


In addition to the above general goals these same authors listed 
eighty “subsidiary goals which are contributory to the major goals 
and which may serve as guides to the teacher in the attainment of the 
major goals." These goals must not “be considered as a course of 
study to be followed." They “аге constituent elements of the larger 
objectives. A listing of them should be a great aid to the classroom 
teacher and the college student who is preparing to teach music or to 
be a supervisor in that field." * 


2 MEASUREMENT OF MUSICAL TALENT 


Measurement of basic musical talent 


Tests of musical aptitude are designed to measure those largely 
innate musical capacities that constitute the individual's musical 
inheritance. Aside from the sheer physical endowment that certain 
types of musical expression demand, there are certain more or less 
psychological factors that determine an individual's musical talent. 
The identification of these factors calls for an unusually critical 
analysis. Without doubt one of the-most extensive research programs 
ever undertaken for the purpose of isolating the elements of native 
capacity in a special field was that undertaken by Seashore and his 
students at the State University of Iowa. 

The Seashore Measures of Musical Talent are a battery of six tests 
on records for phonographic reproduction designed to measure six 
elemental abilities upon which response to musical training appears 
to depend. In the original form of the tests the six elemental abilities 
measured were (т) sense of pitch, (2) sense of intensity, (3) sense 
of time, (4) sense of consonance, (5) tonal memory, and (6) sense 
of rhythm. The testing situation involves careful and critical listen- 
ing. In each of the tests the individual listens to two tones and 
then records his judgment concerning the differences he hears. 


4 Ibid. p. 116. 
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On the basis of many constructive criticisms based on the extensive 
research of his students and published comments concerning the 
tests, the original Seashore tests were completely rebuilt, appearing 
in revised form in 1940. In the revision, a test of the individual's 
ability to distinguish differences in timbre was added. This test 
involves fifty pairs of tones that differ in their harmonic structure. 
The time test and the rhythm test were also revised and improved 
by the use of pure tones of varied duration in the former and by tonal 
pulses as a means of creating the rhythmic patterns in the latter. 
The test on consonance was eliminated. In the 1940 revision the tests 
are arranged in two series: Series A is for general classroom use; 
Series B is adapted for use with specialized groups and in research 
studies. 

Experience with these tests indicates that the earliest age at which 
Such group measures can be used effectively is at the fifth-grade 
level. The tests may be administered to groups, the size of the group 
depending somewhat on the acoustical qualities of the room. Natu- 
rally the stimulus must be heard clearly at all times. 

According to the author's own statement, 


"These measures present the following characteristics: they are based 
on a scientific analysis of musical appreciation and performance; they 
deal with elements which function in all music; they are standardized 
for content so that alternate or new series are not needed; they give 
quantitative results which may be verified to a high degree of certainty ; 
they are economical in that expensive instruments are replaced by phono- 
graph records; they may be used with any language and at any racial or 
cultural level; they are simple and as nearly self-operative as possible; 
they are designed for group measurements; they are interpreted in terms 
of established norms.* 


The Kwalwasser-Dykema Music Tests are quite similar in form 
and function to the Seashore Measures of Musical Talent. The ten 
tests, designed for use in grades four to twelve, require five phono- 
graph records. The elements measured by the alternate-response 
technique are: (1) tonal memory, (2) quality discrimination, (3) 
intensity discrimination, (4) tonal memory, (5) tone discrimination, 
(6) rhythm discrimination, (7) pitch discrimination, (8) melodic 
taste, (9) pitch imagery, and (то) rhythm imagery. 

5 Carl E. Seashore, Don Lewis, and Joseph Saetveit, Manual of Instructions and 


Interpretations for the Seashore Measures of Musical Talent. RCA Manufacturing 
Company, Inc., Educational Department, Camden, N. J., 1939. 


— 
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Measurement of musical memory 


Quite in contrast with the two tests of musical talent discussed 
above is the Drake Musical Memory Test, which measures musical 
aptitude by an entirely different technique. The test is designed for 
persons of any age above seven whether or not they have had musical 
training. The subject listens to twelve melodies played in their 
proper form or with variations in key, time, or notes. He records 
his responses to each of the 54 trials on a special record sheet to 


ene 


as 


S=SAME 


T=TIME change 


nspa e 


K=KEY change 


N=NOTE change 


Excerpts from Score Sheet, Drake Musical Memory Test * 


.. There are 12 trials of entirely different melodies. 
. Listen carefully to the first melody in each trial and remember it. 
. Listen to what is played next and compare it to the first melody to бшшш 


f so record S. 
if so record K. 
. if so record T. 
.. if so record М. 


а. if it is exactly the same as the first melody,... 
b. if it is the same melody played in a different key,. 
c. if the time has been changed... 
d. if any notes have been changed, 


S—exactly the SAME melody. T—change of TIME. 
K=change of KEY. М N=change of опе or more NOTES. 


Practice exercise No. 1. Жүл Practice exercise No. 2, mee] 


Record your answers in the score form given below. 


Each trial will be announced by number. When you hear a number announced you will know 
that a new melody is to be played to which all melodies that follow, in that trial, are to be 


compared. . 
Record your answer during the short pause between each melody. Just time enough will be 


given to write your answer. 
There is never more than one kind of change in any one comparison. 
Fill in every square. Make the best judgment you can for each comparison, 


Write clearly with capital letters. 
In each trial, listen to the first melody. Wait until more is played and record whether it is the 


same, or if a change has been made in time, key, or notes. 
IF THERE IS ANYTHING YOU DO NOT UNDERSTAND ASK ABOUT IT NOW. 


Remember— 


SCORED BY 


TOTAL 
ERRORS=FINAL SCORE 


6 Raleigh M. Drake, Musical Memory Test: Score’ Sheet. Published by Public 
School Publishing Co., 1934. 
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show whether he recognizes the nature of the difference, if any, 
between the melody itself, which is played first, and the various ver- 
sions of it which follow. The above reproduction of the directions and 
of the response section of the score sheet shows the manner in which 
the test is given and the manner of recording responses. 


3 MEASUREMENT OF MUSICAL ACHIEVEMENT 


The knowledge, skill, and appreciative outcomes of music instruc- 
tion are measured by a variety of tests of the pencil-and-paper 
variety. The majority of these instruments appear to measure the 
knowledge and skill objectives quite adequately, but they largely 
neglect the appreciative outcomes. This is not surprising, because of 
the fact that appreciations are almost impossible to define and ex- 
tremely difficult to measure. 

The Beach Standardized Music Tests were among the real pioneers 
in the measurement of musical achievement. Many of the elements 
measured by these tests are recognized among the tests of more recent 
development. The following qualities are scheduled for measurement 
by the test: 


Knowledge of essential facts of musical notation. 
2. Ability to hear and distinguish the component parts of music, namely 
the elements of time and tune both in isolated form and in melodies. 
3. Aural recognition of the structural elements of music fundamentally 
necessary for intelligent appreciation. 
Pitch discrimination. 
Musical memory. 
Sight singing through indirect methods. 
The writing of music. 


un eee 


Measurement of musical knowledge 


Tests of musical knowledge are variously concerned with musical 
symbols and terms, time and key signatures, note and rest values, 
syllables, instrumentation of the orchestra, musical form, and the 
history and biography of music. Samples are given below to illustrate 
the measurement techniques rather commonly used. Multiple-choice 
and simple recall items and matching exercises appear to be most 
common among the testing techniques used, although the true-false 
item is used occasionally, The following samples are somewhat rep- 
resentative of the content of various tests. 
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Sample A.* 
Composers ОЕ FAMOUS COMPOSITIONS 


Directions: Below are the names of famous compositions. On the lines 
at the right you are to write the name of the composer of each. The 
sample is marked as it should be. 


Sample: The Elijah...... Mendelssohn. ..... 
ri. March Slavi ISP URL Уа cre. aet 
2. Toa МЛ ЕО fedes pet tni a E 
з. The Unfinished Symphony .......................... 
4. liebestraum, M CoU де ЕИ E лү. ЕЗ: 


Sample B.* 

In the major key signatures below determine what the name of each 
one is. Find that name above, take its number, and write it in the blank 
at right of each one, as shown at a. Ready! Go! 


T .— MEM —N 


Sample C.° 

т. ( ) The viola is an alto horn. 

2. ( ) Violins are frequently employed in brass bands. 

3. ( ) The first violin section is seated to the left of the conductor. 
4. ( ) The harpsichord is опе of the predecessors of the piano. 


Sample D.*° 


1 | The time signature is H i 
1 fre The note needed is 5 d o 4 5 1 


7 Jacob Kwalwasser, Kwalwasser Test of Music Information and Appreciation. 
Published by Bureau of Educational Research and Service, University of Iowa, 1927. 

8 Clara J. McCauley, McCauley Examination in Public School Music. Published 
by Jos. E. Avent, 1933- 


9 Kwalwasser, op. cit. » 
10 Jacob Kwalwasser апа С. M. Ruch, Kwalwasser-Ruch Test of Musical Accom- 


plishment. Published by Bureau of Educational Research and Service, University of 
Iowa, 1924. 


МА 


8 
Си 
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Sample E." 
TEST 1. The Way Musical Instruments Are Played. 


Directions: Below is a list of musical instruments. They are not all played in tbe same way. In the blank space 
'opposite each instrument write the letter. 
A-—for each one played by blowing. C—for each one played with a bow. 
B—for each one played by plucking, or picking, © D—for each one played by striking, or shaking, 
The sample is marked as it should be. 


Sample: Ukulele... P __ (because it is played by plucking.) 


Begin here. 
1 Comet — 1. Trumpet. —. 13. Trombone. 19. E-flat Ло 
2, Вапјо  —. 8. Mandolin... __ 14 Guitar, 20. Harp... 


Measurement of musical skills 


Among the musical skills most commonly measured by various 
tests are detection of pitch and time errors and recognition of 
melodies. Illustrations of each are given below. The first represents 
a type of matching situation and the second a recognition form of 
item measuring ability to detect errors. 


Sample F.12 


(e) Below are printed the opening strains of five familiar melodies. 
After reading or humming them one by one, select the title of each 
from the list of answers below. Then place the corresponding num- 
ber in the square at the right of each melody. The sample is correct. 


List ОЕ ANSWERS 


І. Silent Night 5. America the Beautiful 
2. Old Black Joe 6. Auld Lang Syne 

3. Santa Lucia 7. Star Spangled Banner 
4. Home Sweet Home 8. America 


А 11 Glenn Gildersleeve and Wayne Soper, Musical Achievement Test. Published by 
ud of Publications, Teachers College, Columbia University, 1933. 

„22 Frank A. Beach, Beach Music Test, Revised. Published by Bureau of Educa- 
tional Measurements, Kansas State Teachers College, Emporia, 1938 
A : 
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Sample G.!* 
Test 3. DETECTION OF PITCH ERRORS IN A FAMILIAR MELODY 


Drrections: The song “America” is written below. One measure has 
been crossed out because the melody is wrong. Five other 
measures are wrong. Hum over the melody to yourself 
and cross out all five wrong measures. 


Begin here: 


\ 


Measurement of music appreciation 


Only one test purporting to measure music appreciation, the 
Kwalwasser Test of Music Information and Appreciation, is known 
to the authors. Its approach is mainly through the testing of knowl- 
edges, many of which unquestionably carry appreciative values with 
them. However, many critics are not convinced that appreciations 
are measured directly, if, indeed, they can be measured in that 
manner. In view of the modern emphasis upon music appreciation 
for all pupils, it is unfortunate that the appreciative types of out- 
comes are not subject to satisfactory measurement. 


4 CHARACTERISTICS AND AIMS OF ART EDUCATION 


General trends in art education 


The conception of art for art’s sake, which dominated the field of 
art education for many years, has largely given way of late to. the 
belief that all pupils should receive an opportunity in art courses to 
develop a sensitivity to beauty and critical taste in evaluating art 
objects. Hence, art is no longer thought of as a field only for the 
talented few. Creative self-expression, especially in the lower grades, 
and correlation of art with other activities of the school are important 
modern trends. These trends involve the use of a wide variety of art 


13 Kwalwasser and Ruch, op. cit. 
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materials in the classroom. Extension of the content of art education 
courses beyond the drawing and painting, which largely constituted 
the-curriculum in the past, particularly to industrial arts is another 
trend worthy of note. Last, and perhaps most important, the ap- 
preciative aims of art education have increasingly come to the front. 

Modern aims and purposes of art education in relation to current 
social needs are summarized effectively in the following statement: 
“Art in the modern school should aim both to stimulate in the child 
the experience of creating and to help him to improve the manner in 
which he expresses himself through creative processes; at the same 
time, it should aim to stimulate in him the experience of appreciating 
by acquainting him systematically with fine examples of the arts of 
various peoples, both of the present and of the past.” 15 


General outcomes of art education 


The problems of measurement in the arts are complicated by the 
fact that there is little agreement among workers in these areas on 
what the general as well'as the specific instructional outcomes should 
be. If, as some hold, the primary objective of all art instruction is an 
increase in the power of the individual to recognize and “respond to 
essential aesthetic values, and to realize those values in action," 1% 
the measurable outcomes are rather vague and limited. To these 
individuals, such elements as artistic skill or dexterity, knowledge 
about the arts or art products, or the ability to analyze an art product 
structurally or historically, while objectively measurable, are of 
minor importance as defensible outcomes. 

There are numerous expressions of the general outcomes of art 
education, but these three appear to be most representative and of 
major importance: (т) information, (2) appreciation, and (3) ex- 
\pression."" It is quite probable that art appreciation is not necessarily 


xs obert S. Hilpert, *Changing Emphases in School Art Programs." Art in 
American Life and Education, Fortieth Yearbook of the National Society for the 
Study of Education. Public School Publishing Co., Bloomington, Ill., 1941. p. 452-53. 

15 Leon L. Winslow, The Integrated School Art Program. McGraw-Hill Book 
Co., Inc, New Vork, 1949. p. 43. 

db James L. Mursell and others, “The Measurement of Understanding in the 
Fine Arts.” The Measurement of Understanding, Forty-Fiith Yearbook of the 
National Society for the Study of Education, Part I. University of Chicago Press, 
Chicago, 1946. Chapter то. 

ү Walter H. Klar, Leon L. Winslow, and C. Valentine Kirby, Art Education in 
Principle and Practice. Milton Bradley Co., Springfield, Mass., 1933. 
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taught, although real appreciation may be considered to rest to a 
large degree upon the broader aspects of information. There will still 
remain something in the truly artistic product which sheer informa- 
tion does not entirely explain. The third major objective might be 
better expressed as exploration. Not many potentially great artists 
are discovered in the elementary-school or high-school classroom, but 
practically all the great artists there are have come up through this 
avenue. Not everyone can express himself effectively in artistic form, 
but everyone has a right to explore for himself the fields of human 
expression in the hope that his own hidden talent may be uncovered. 
Art talent and achievement tests have distinct contributions to make 
in this field. 


Specific outcomes of art education 


The determination of specific outcomes of art education is further 
complicated by absence of definite agreement on the level of mastery 
to be expected at any given grade or maturity level. No special out- 
come or level of mastery of any outcome seems to be assigned as the 
responsibility of a particular grade. Kirby,'* in a discussion of aims 
and tendencies in art education, suggested that the elementary school 
faces the responsibility for bringing to the child a fourfold artistic 
experience. The first is the graphic experience, which expresses itself 
in representative drawing, illustrative and imaginative drawing, na- 
ture drawing, and other related forms. The second is the thought ful 
experience, involving the constructive, decorative phases of artistic 
expression, The third experience involves the acquisition of motor 
skill in expression. The fourth is the emotional experience, which 
involves the appreciation of the arts. This last is by far the most 
difficult to evaluate. 

A somewhat more detailed expression of outcomes of instruction 
in art is given in the accompanying outline adapted from a course 
of study covering the first six years of the elementary school. It will 
be noted that this course is organized around four groups of outcomes 
which are quite similar to the artistic experiences presented in the 
previous paragraph and which also reflect the point of view presented 


for instructional outcomes on pages 168 to r7o. 


18 C, Valentines Kirby, “Aims and Tendencies.” Pennsylvania School Journal, 
77:501-2; April 1929. i 
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OUTCOMES or Art INSTRUCTION ?? 


A. Fruitful knowledge 


Functional information 

Practical relation of art to everyday life (clothing, home, town, or 
city, etc.) 

Understanding of elements and principles of art and their adapta- 
tion to everyday use 

Knowledge of construction and industrial processes involved in art 
training 

Acquaintance with art of other countries 


B. Attitudes, interests, and appreciations 


Civic consciousness (civic pride) 

Appreciation and understanding of beauty in modern products of 
all kinds 

Interest in art museums, travel, and further study 

Interest in the civic, domestic, and social service of art 


C. Mental technique 


Good taste, discriminating judgment, ability to select and choose 
wisely 

Creative ability, originality, initiative, imagination, keen observa- 
tion 

Ability to analyze works of art and to understand the factors of 
beauty in production 

Keener observation; beauty of nature and fine things of art 


D. Right habits and skills 


Constructive thinking and planning 
Systematic organization 

Practical technique 

Coordination of mind, hand, and eye 
Freedom and spontaneity 

Order, neatness 

Body and mind training 
Self-activity 

Worthy use of leisure time 


19 Adapted from W. G. Whitford, An Introduction to Art Education. Century 
Co., New York, 1929. Г 
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The accomplishment of these specific outcomes of instruction in art 
in the elementary school provides an excellent basis for the general 
art course in the high school. The following statements of purposes 
of such a general art course apply to the secondary-school level.?? 


т. To develop appreciation (visual, mental, enjoymental) through 
a. The development of knowledge and understanding of art quality 
b. The development of a love for and an understanding of beauty 
in nature and in worthy works of man 


2. 'To give exercise in creating beauty (technique, motor) through 
a. The development of ability to choose and arrange fine objects 
for specific purposes 
b. The development of ability to originate, design, and produce fine 
objects with respect to standards of excellence 
c. 'The increase in visual experience 
d. The development of powers of invention, imagination, observa- 


tion 


5 MEASUREMENT OF ART ABILITIES AND ACHIEVEMENT 


Three types of tests may be distinguished in the field of art educa- 
tion: (т) drawing scales and tests, (2) art appreciation tests, and 
(3) art abilities tests. As many of the art tests cannot be illustrated 
easily, brief descriptions of a few representative scales and tests and 
one illustration are given on the following pages to familiarize the 
student with some types of measurement devices in this field. 


Drawing scales and tests 


Several rating scales for use in the evaluation of art achievement 
are now available. Such scales must depend, as do all scales, upon 
the representative nature of the specimens selected for presentation 
and the skill of judges in using the scales. Evidence from a study of 
the values of drawing scales indicates that their use reduces the 
inaccuracy of ratings to about one-half of that obtained when no 


Scale is used.?* 


20 Ibid. р. 443. i ; › 
po Fowler D Brooks “The Relative Accuracy of Ratings Assigned with and with- 


out the Use of Drawing Scales.” School and Society, 27:518-20; April 28, 1928. 
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The Kline-Carey Drawing Scales consist of series of samples for 
measuring (1) representation, and (2) design and composition. The 
first series uses such subject matter as a house, a running boy, a tree 
in silhouette, and a rabbit in scales having 14 samples, while the 
second uses the themes of illustrations, posters, structural designs, 
and borders. 


Tests of art appreciation 


The increasing stress placed upon the appreciative outcomes of 
art instruction of recent years results in a significant place for art 
appreciation tests among evaluative tools. Two tests of art judgment 
or talent are worthy of brief discussion here—the Meier Art Judg- 
ment Test and the McAdory Art Test. 

In the Meier Art Judgment Test, which may be given as an indi- 
vidual or as a group test, the pupil is confronted with roo pairs of 
artistic specimens adapted from many sources. One of each pair 
of specimens has been changed in some specific element from the 
original form. The exact feature changed is specified in the record 
sheet on which the pupil records his reactions. A consideration of the 
complete series of paired specimens insures a comprehensive sampling 
of the various elements that enter into esthetic judgment. According 
to the evidence obtained by the author, this test measures the sensi- 
tivity of the individual to the effect that the composition as a whole 
produces on the observer. In order to give a better idea of the exact 
nature of specimens and the accompanying record sheets, a single 
pair of the etchings is reproduced here along with a brief sampling of 
seven items from the Test Record Sheet. The pair of specimens re- 
produced here is used with item 49 in the record sheet. In this item, 
the presence or absence of horns is the point to which the pupil is 
expected to give special consideration in making his judgment. 
The scoring key lists the drawing with horns as the one of greater 
merit. 

The Meier Art Judgment Test, which supersedes the Meier-Sea- 
shore Art Judgment Test, is made up of a smaller number of carefully 
selected items. The McAdory Art Test is somewhat similar to the 
Meier Art Judgment Test, but has only 72 pairs of plates, 24 of which 
are in color, and calls for reactions to a wide variety of materials, 
such as furniture, clothing, and Tugs. 
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Excerpts from Meier Art Tests, 1, Art Judgment 2° 


‘DIRECTIONS 
In the accompanying booklet are pictures arranged in pairs, the two in each pair being very neatly alike. ‘They differ 
only in one respect and you are told what that is in each tase on pages 1, 2, and 3 of this blank. 
You are to compare the two pictures in each pair, noting the unlike portion, and then decide which one is better 
(more pleasing, mort artistic, more satisfying). Do not hurry. Study each pair carefully in turn. 
Indicate your preference by making an X in the circle under Left, if you decide that the left-hand picture is better, 
or in the circle under Right if you believe that the riglit hand one is more desirable, 
Examples of proper marking: (pictures not illustrated). 
Left Right No. 
e o A Presence or absence of tree. (This would mean that you prefer the lefthend picture) 
О @ В тешем of waves, (This would mean that you prefer the righthand, picture) 
——————————-——— 
Select the better опе in every рай. Do not omit апу. If unable to decide within a reasonable time mark tbe place 
and return to that one later. 
Pair No. Difference 
1 Arrangement of wall and foreground 
2 Foreground : 
49 Inclusion or omission of the horns 
$0 Arrangement in picture of the woman and umbrella 
Position of the figures 
99 Direction of pine tree's main branch 
100 Treatment of the water 


ok 
оё 


‘ 
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Art abilities tests 


Two tests that purport to measure art abilities mainly the out- 
comes of art instruction are the Lewerenz Test in Fundamental 
Abilities of Visual Arts and the Knauber Test of Art Ability. Their 
major values seem to be at the junior and senior high-school levels, 
although the first is designed for use as low as the third grade. 


22 Norman C. Meier, The Meier Art Tests, I, Art Judgment. Published by Bureau 
of Educational Research and Service, University of Iowa, 1940. 
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The Lewerenz Test їп Fundamental Abilities of Visual Arts is 
designed to measure aspects of art ability in nine areas: (1) recog- — 
nition of proportion, (2) originality of line drawings, (3) observation 
of light and shade, (4) knowledge of subject-matter vocabulary, (5) 
visual memory of proportion, (6) analysis of problems in cylindrical 
perspective, (7) analysis of problems in parallel perspective, (8) 
analysis of problems in angular perspective, and (9) recognition of 
color. 


Applied art 


Education will have failed in much of its social responsibility if it 
allows the individual to leave the school without developing in him a 
rather definite love of the fine arts, even though it may be on a rela- 
tively low level. Not everyone can or should become a musician, a 
painter, or a sculptor, but almost everyone has the essential elements 
which make for a love and appreciation for the beautiful which he 
himself may be unable to produce. Instruction in the fine arts should 
cultivate and develop these elements. Furthermore, such instruction 
has a rather definite responsibility for making the arts function in 
real life in matters of personal adornment and in the planning and 
decorating of the home. The general cultural level of the population 


will be raised as this point is recognized and applied in instruction 
in the fine arts. 


Topics for Discussion 


1. In your opinion is there any reason to assume that achievement in 
the fine arts may not be objectively measured? Support your answer. 


3. What are the major types of aims in music education? 
4. Which of the aptitude tests discussed here seem to be most soundly 
grounded in critical research? 

5. Briefly discuss and illustrate the manner in which musical knowl- 
edge and musical skills are measured. 

. What is the status of standardized tests of musical appreciation? 

Я What similarities in the basic problems of measurement do you see 
in the fields of music and art? 


What are the major classes of general outcomes in art instruction? 


Oo Se OQ 
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9. Which of the art tests described here seem most adequately to 
measure the major features of accomplishment in art? 

10. Distinguish clearly between art appreciation and art abilities tests 
with respect to their functions and forms. 
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Measuring and Evaluating in the 
Industrial and Practical Arts 


THE FOLLOWING points involved in the measurement and evaluation 
of the industrial and practical arts are treated in this chapter: 


А. Social and educational significance of the industrial and 
practical arts, 

Major objectives of industrial education. 

Major objectives in household economics, 

Testing in industrial education. 

Testing in household economics. 

Aptitude, vocational, and trade tests. 
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1 SOCIAL AND EDUCATIONAL SIGNIFICANCE OF INDUSTRIAL 
AND PRACTICAL ARTS 


Industrial and practical arts 


In addition to its other responsibilities, education must make itself 
immediately useful to those engaged in the process of living in a very 
practical world. The term "industrial education" itself emphasizes 
the need for an education that is a preparation of all for successful 
living in an industrial society, as well as a specific preparation of 
some for the more specialized types of productive labor in this same 
society. Thus the industrial arts face many difficult problems in 
meeting these demands. 

Formerly vocational courses were designed to prepare the indi- 
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vidual for a more effective entrance into and pursuit of a particular 
kind of occupation. In our modern educational program courses of 
this type are not particularly important, because of the high state of 
specialization of the industrial system. Factories and plants have 
become so mechanized and the type of work each workman is called 
upon to do is so specialized that there is little practical need for the 
genuinely vocational course. Furthermore, most industrial establish- 
ments have found that much of what is taught the pupil in the voca- 
tional courses must. be unlearned before he becomes an effective 
workman in the narrow tasks he is called upon to perform. Practically 
all industrial organizations employing any number of workers main- 
tain their own training schools in which the workers are taught their 
specific tasks under factory conditions. But this does not mean that 
industrial education is not receiving any emphasis in the general 
program of education. It merely means that the emphasis has 
changed. Specialization in the vocations and in manufacturing proc- 
esses has made it more and more important that the school build up 
a rich prevocational background rather than technical skill. This the 
manual and industrial arts courses of the junior and senior high 
schools are now attempting to do. „ 

The manual arts courses have a place in the general program of 
education corresponding to their effectiveness in helping men to 
become socially efficient. Such courses contribute to social efficiency 
in different ways. To some individuals they may give vocational 
power through actually contributing to the individual's ability to 
earn a livelihood. To all they should give first-hand knowledge of the 
material accessories of life. Man's physical welfare and happiness 
depend to a large degree upon his ability to select or construct, to 
understand, and to use modern conveniences. Moreover, the manual 
arts in their modern form do much to develop an appreciation of 
beauty in relation to material, form, color, tone, and texture. This 
is only an element in aesthetic enjoyment, but it may contribute 
largely to efficiency and productivity. 

In the elementary grades, the individual pupil is best served by 
the manual arts courses that give him a broad foundation of first- 
hand contact with a large variety of materials and processes rather 
than specific vocational or technical training. Here the aim is not so 
much to teach techniques as it is to stimulate and guide the indi- 
vidual's natural constructive activity and to utilize the opportunities 
that naturally present themselves at this age for expression through 
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concrete materials. In the junior-high-school grades it may be as- 
sumed that the child is ready, both mentally and physically, to form 
definite habits and to develop definite skills in the use of his hands. 
'The end sought may be vocational, general, or both, but in any case 
the manual arts should be so correct in technique, should place such 
emphasis on fundamental processes, and should be so closely in 
harmony with the correct forms of industry that, so far as they go, 
they will have distinctly vocational values. 

A general summary of the industrial education courses is peculiarly 
difficult to make because of the number and types of courses. Among 
those usually offered in the various junior and senior high schools 
are manual training, or woodworking and cabinetmaking ; carpentry ; 
shopwork, which includes working in both wood and metal; plumb- 
ing; home mechanics; elementary sheet metal; electricity and elec- 
tric shop; printing; automobile mechanics ; machine-shop practice; 
and mechanical drawing. 

Practical arts for girls are variously named. Probably the general 
designative title for such courses should be Practical Arts for Girls. 
Under such a general heading may be included domestic science, 
household arts, cooking, sewing, a general industrial course, book- 
craft, home hygiene and care of the Sick, and home decoration. 


Objectives of industrial arts instruction 


Industrial arts courses must contribute in a very definite sense to 
the general educational objectives of the secondary school. The aims 
of industrial arts courses, while expressed in many different forms, 
may be briefly stated as follows: 


1. To offer exploratory industrial activities to aid in revealing inter- 
ests, aptitudes, and vocational possibilities 
A. Such exploratory values are noted in the following “self-find- 
ing” aims: 
т. To try out individual interests, inclinations, and abilities 
for industrial pursuits 
2. To make reliable studies of the conditions, demands, and 
opportunities in related occupations 
3. To provide for the industrial needs of pupils who would 
not remain for academic education alone 


4. To help pupils more wisely to choose future courses in 
secondary.and higher education 
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II. To provide general manual arts experiences of common value to all 
pupils who elect such work 


э») 


А. Such common values аге found in the following “consumers 

or utilizers’ aims: 

1. To develop “handyman” abilities through repair and con- 
struction work for home, shop, and office 

2. To assist in better choice and use of industrial products 
and service 

3. To gain sympathetic attitudes toward other workers and 
their work 

4. To appreciate economic production by first-hand experi- 
ence in production work 


ПІ. To offer opportunity for beginning specialized preparation for en- 
trance into chosen industrial fields 


A. Such opportunity should provide such guidance and limited 

training facilities as are included in the following aims: 

т. To assist in the final selection of a promising career 

2. To extend the tryout activity to meet the preparatory- 
vocational needs of pupils who must leave school with a 
minimum of preparation 

3. To provide supplementary training through coordinated 
shopwork, and related schoolwork in mathematics, science, 
drawing, et cetera 

4. To offer opportunities for commercial experiences through 
cooperation with outside industrial agencies 


B. The more intensive industrial courses to be formulated to serve 
two purposes: 

т. To meet the needs of boys with mechanical ability who 

might profit more by industrial work than by the regular 


courses of study 
2. To assist those who remain only a short time in school and 
who need short, intensive courses of training before en- 


tering industry 


Throughout all industrial arts education the solving of problems, 
the training of individuals to meet new and unexpected situations 
intelligently, or training in creative or constructive thinking should 
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be emphasized. The following summary statements of the teaching 
aims of industrial arts express these modern objectives quite con- 
cisely, and in a form that should be of real assistance to the student 
or teacher interested in the measurement of understandings in the 
industrial arts field. 


I. Ability to express one's self through planning and constructing 
projects, using common tools and a variety of construction ma- 
terials, typical of industry. 

2. Discovery of aptitudes and reactions contributing to the maturing 
of life interests, both of a vocational and of an avocational char- 
acter. 

3. Understanding of industry and its products and services, together 
with their influence in determining patterns of living in modern 
society. 

4. Ability to read and make working drawings for planning and con- 
structing useful projects typical of modern industry. 

5. The ability to choose industrial products with reference to design, 
pleasing color combinations, and durability; and to maintain and 
service such products. 

6. Growth in abilities and attitudes related to mathematics, science, 


and the language arts, and to work habits, safety practices, and 
cooperation with others. 


Objectives of home economics 


One of the very useful statements of the outcomes of personal and 
family-life education is that resulting from a study made by a joint 
committee of the American Home Economics Association and the 
American Council on Education, and reported by McGinnis and her 
colleagues ° in a discussion of the measurement of understanding in 
home economics. The outline presented here is an adaptation and 
abbreviation of this highly detailed statement. 


1 Maris M. Proffitt, E. E. Ericson, and Louis V. Newkirk, “The Measurement of 
Understanding in the Industrial Arts.” The Measurement of Understanding, Forty- 
Fifth Yearbook of the National Society for the Study of Education, Part I. Uni- 
versity of Chicago Press, Chicago, 1946. p. 303-5. j 

2 Esther McGinnis and others, "The Measurement of Understanding in Home 
Economics." The Measurement, of Understanding, Forty-Fifth Yearbook of the 
National Society for the Study of Education, Part I. University of Chicago Press, 
Chicago, 1946. p. 254-57. 
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OUTCOMES or EDUCATION FOR PERSONAL AND FAMILY LIFE 
I. Personal Adjustment 


A. Health 


т. Importance of personal mental and physical health status; 
food habits, personal hygiene, adequacy of senses, recrea- 
tion, emotional adjustments 


2. Relationship of health, safety, control of communicable 
diseases to personal, family, and community betterment 


3. First-aid procedures 


B. Other Personal Adjustments 
т. Importance of attractive personality, socially acceptable 
manners, proper family relationships and ways of attain- 
ing these 
2. Importance of developing wholesome attitudes toward sex 
and ways of achieving and maintaining friendly relation- 
ships with both sexes 


C. Vocational Choice 
т. Relation of vocational demands to personal qualities, 
abilities, and training in homemaking 
2. Considerations involved in payroll jobs versus homemak- 
ing for women 


IL Use of Time and Energy 
A. Ways to compare values and make choices in management of 
resources 
B. Ways of substituting skills for money in maintaining stand- 
ards of living 


C. Effective assignment of personal energy and skills in relation 
to satisfactory division of labor in family-life relationships 


III. Use of Money 
A. Planning short-time and long-time spending 
B. Banking, credit, budgeting 
C. Importance of insurance, annuities, social security, and sav- 
ings for personal and family welfare 
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IV. 


VI. 


VII. 
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The Family and Children 


A. 


B. 


mm XU o 


The family 

т. Biological and other factors affecting courtship and suc- 
cessful marriage 

2. Problems, crises, and adjustments among family members 

3. Aesthetic and cultural values in the home 

4. Implications of variety of racial, national, religious, and 
family patterns in the United States 

The children 


1. Social significance of bearing and rearing children 
2. Relation of environment to child's security and normal 
physical, mental, social, and emotional development 


. Foods and Nutrition 


Ways of meeting nutritional standards at reasonable cost 
Importance of acceptable table manners 

Principles of food preservation 

Planning suitable meals for individuals and families 
Cookery principles and ability to use them 

Serving food attractively 


Clothing and Textiles 


A. Choosing harmonious, suitable, and becoming textures, colors, 
and styles at reasonable cost 

B. Constructing, repairing, and making alterations in clothing 
and household furnishings 

C. Laundering clothes and household fabrics 

D. Using and maintaining sewing and washing machines and 
related equipment 

'The Home 

A. Factors for consideration in group planning for housing 

B. Aesthetic and artistic factors involved in home furnishings 

C. Caring for furnishings and equipment 
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2 MEASUREMENT IN INDUSTRIAL ARTS 


Measurable factors in industrial arts 


The following factors, affecting or revealing achievement in the 
industrial or practical arts, lend themselves more or less readily to 
objective measurement. Educational guidance and instruction in the 
manual and industrial arts should be made much more effective 
through the measurement of these factors and the wise use of the 
results in the classroom. 


General ability as indicative of probable success in the manual arts 
Special abilities as indicative of success in the manual arts 
Interests and attitudes—mechanical, vocational 

General information—vocational, mechanical, technical 

Degree of trade abilities possessed 

Knowledge of diversified vocational activities 

Skills in mechanical drawing 

Artistic judgment 

Essential knowledge in field of home mechanics 


o от осел зн 


General uses of industrial arts tests 


'The possible uses of tests in the industrial arts fields are many. 
One very important use is in the prediction of probable future success 
in specific types of vocational work. Tests in industrial arts are also 
useful for evaluating the outcomes of instruction. A few very useful 
tests for this purpose have been developed. Much work in the re- 
finement of measurement techniques remains to be done here, as in 
most other fields of instruction. Naturally, progress in the develop- 
ment of more effective measuring instruments must follow the more 
exact definition of the unit skills and informational elements compris- 
ing the minimal essentials in industrial arts instruction. 


Visual arts tests 


The ability to visualize and to project an accurate and tangible 
reproduction of a mental image is undoubtedly highly related to 
successful work in certain aspects of the industrial arts. Accordingly, 
manual and industrial arts teachers have been much interested in the 
objective measurement of these abilities. The McAdory Art Test, 
the Lewerenz Test in Fundamental Abilities of Visual Art, and the 
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Meier Art Judgment Test are three objective standardized tests that 
contribute to this field. A discussion of these three tests appears in 
Chapter 21 of this volume. 


Mechanical drawing tests 


Brief comments on five standardized tests in mechanical drawing 
are given in this section as representative of tests in this field. The 
Badger Mechanical Drawing Tests are intended for use in a general 
checkup on the pupil's accumulated knowledge of mechanical draw- 
ing rather than as a measure of his drawing ability as revealed by 
his neatness, accuracy, and lettering. Test r covers the use of tools, 
linework, dimensioning, and lettering; Test 2 deals with projection; 
and Test 3 measures ability in pictorial drawing. The Castle Mechan- 
ical Drawing Test has five sub-tests which measure ability to identify 
similar parts of an object shown in top and side views, ability to deal 
with dimensions, familiarity with geometric terms, pencil technique, 
and skill in inking. 

Fischer's Mechanical Drawing Tests are in two parts. Part I is a 
test of the technical information necessary in drawing, and no in- 
struments are needed other than a pencil. Part II is a performance 
test and requires the use of drawing instruments. The purpose of the 
Wells-Laubach Knowledge Test of Mechanical Drawing is to measure ` 
the informational side of the subject, and the Wright Achievement 
Test in Mechanical Drawing is intended for use as a general achieve- 
ment test. 


Industrial arts tests 


The first really satisfactory standardized test of achievement in 
this area was the Nash-Van Duzee Industrial Arts Test. Test 1, 
Scale A, is designed to measure the knowledge of junior and senior 
high-school pupils concerning processes, tools, materials, and in- 
formation used in woodworking. Test r, Scale B, supplements Scale 
A by measuring the pupil's skill in manipulating hand woodworking 
tools. Test 2 of the test measures ability in mechanical drawing. 

The Middleton Industrial Arts Test is a 190-item test in two parts 
designed to measure the basic knowledge outcomes of high-school 
students enrolled in industrial arts courses. Tt is primarily a test of 
factual-type learning drawn from the fields of emphasis in the various 
textbooks on the subject. Useful percentile equivalents for indi- 
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vidual pupil scores and percentiles corresponding to class median 
scores are provided. 

Three tests that cover the basic knowledges in the fields of ele- 
mentary woodworking, machine-shop work, and printing are the 
Wells Knowledge Test in Woodworking, the Wells Knowledge Test 
in Machine Shop, and the Wells Knowledge Test in Printing. Among 
other tests in this field are the Hunter Industrial Arts Test and the 
Wells-Laubach Industrial Arts Test. 

An extensive illustration of a performance test in woodworking is 
given on pages 210 to 212 of this volume. A rating scale for driving 
nails is shown on page 209. 


Testing home mechanics 


The chief purpose of the Newkirk-Stoddard Home Mechanics Test, 
which unfortunately is now out of print, was to measure in an ob- 
jective and analytical manner the essential knowledge that pupils 
should acquire from a well-organized course in home mechanics. In 
spite of the fact that it is no longer available it is presented here as 
an example of sound test validation and testing technique.’ Each 
form of the test as standardized consisted of 36 jobs, comprising 
important tasks in home mechanics. Following the statement of the 
assigned task five steps in the procedure were given in wrong order. 
The response called for the recording of these steps in the correct 
order. The accompanying sample from the test illustrates this type 
of rearrangement item form which is very useful in the industrial 
and practical arts. 

Excerpt FROM NEWKIRK-STODDARD Home MECHANICS Test * 
No. 2 То Solder Holes in Sheet Metal or Cooking Utensils. 


Procedure: 
(т) Test the finished job for leaks. 
(2) Allow solder a few seconds to cool. 
(3) Apply flux to parts to be soldered. 
(4) Clean parts to be soldered. 


der with a hot tinned copper. 
Ee s (npe Q9) 109) 


ў 7 dies 
? Loui ki idati d Testing Home Mechanics Content. Stu 
Louis V. Newkirk, Validating an fo pamela 


i i iversity of Iowa, 
in Education, Volume VE, No: аа irk-Stoddard Home Mechanics 


Test. Originally published by Bureau of 
versity of Iowa, 1928. Now out of print. 
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Meier Art Judgment Test are three objective standardized tests that 
contribute to this field. A discussion of these three tests appears in 
Chapter 21 of this volume. 


Mechanical drawing tests 


Brief comments on five standardized tests in mechanical drawing 
are given in this section as representative of tests in this field. The 
Badger Mechanical Drawing Tests are intended for use in a general 
checkup on the pupil's accumulated knowledge of mechanical draw- 
ing rather than as a measure of his drawing ability as revealed by 
his neatness, accuracy, and lettering. Test т covers the use of tools, 
linework, dimensioning, and lettering; Test 2 deals with projection ; 
and Test 3 measures ability in pictorial drawing. The Castle Mechan- 
ical Drawing Test has five sub-tests which measure ability to identify 
similar parts of an object shown in top and side views, ability to deal 
with dimensions, familiarity with geometric terms, pencil technique, 
and skill in inking. 

Fischer's Mechanical Drawing Tests are in two parts. Part I is a 
test of the technical information necessary in drawing, and no in- 
Struments are needed other than a pencil. Part IT is a performance 
test and requires the use of drawing instruments. The purpose of the 
Wells-Laubach Knowledge Test of Mechanical Drawing is to measure ` 
the informational side of the subject, and the Wright Achievement 
Test in Mechanical Drawing is intended for use as a general achieve- 
ment test. 


Industrial arts tests 


The first really satisfactory standardized test of achievement in 
this area was the Nash-Van Duzee Industrial Arts Test. Test 1, 
Scale A, is designed to measure the knowledge of junior and senior 
high-school pupils concerning processes, tools, materials, and in- 
formation used in woodworking. Test т, Scale В, supplements Scale 
A by measuring the pupil’s skill in manipulating hand woodworking 
tools. Test 2 of the test measures ability in mechanical drawing. 

The Middleton Industrial Arts Test is a 190-item test in two parts 
designed to measure the basic knowledge outcomes of high-school 
students enrolled in industrial arts courses. It is primarily a test of 
factual-type learning drawn from the fields of emphasis in the various 
textbooks on the subject. Useful percentile equivalents for indi- 
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vidual pupil scores and percentiles corresponding to class median 
scores are provided. 

Three tests that cover the basic knowledges in the fields of ele- 
mentary woodworking, machine-shop work, and printing are the 
Wells Knowledge Test in Woodworking, the Wells Knowledge Test 
in Machine Shop, and the Wells Knowledge Test in Printing. Among 
other tests in this field are the Hunter Industrial Arts Test and the 
Wells-Laubach Industrial Arts Test. 

An extensive illustration of a performance test in woodworking is 
given on pages 210 to 212 of this volume. A rating scale for driving 
nails is shown on page 209. 


Testing home mechanics 


The chief purpose of the Newkirk-Stoddard Home М echanics Test, 
which unfortunately is now out of print, was to measure in an ob- 
jective and analytical manner the essential knowledge that pupils 
should acquire from a well-organized course in home mechanics. In 
spite of the fact that it is no longer available it is presented here as 
an example of sound test validation and testing technique.’ Each 
form of the test as standardized consisted of 36 jobs, comprising 
important tasks in home mechanics. Following the statement of the 
assigned task five steps in the procedure were given in wrong order. 
The response called for the recording of these steps in the correct 
order. The accompanying sample from the test illustrates this type 
of rearrangement item form which is very useful in the industrial 


and practical arts. 
EXCERPT FROM NrwkiRK-STODDARD HOME MzcHANICS ТЕТ * 
No. 2 То Solder Holes in Sheet Metal or Cooking Utensils. 


Procedure: 
(1) Test the finished job for leaks. 
(2) Allow solder a few seconds to cool. 
(3) Apply flux to parts to be soldered. 
(4) Clean parts to be uides (E 
2 solder with а hot tinned copper. 
M E Co) ENCO) 


8 Louis V. Newkirk, Validating and Testing Home Mechanics Content. Studies 


i - геї Сібу, 1932. 

in Education, Volume VI, No. 4. University of Iowa, Iowa Я 
* Louis V. Newkirk and George D. Stoddard, Newkirk-Stoddard s os M red 

Test. Originally published by Bureau of Educational Research and Service, 


versity of Iowa, 1928. Now out of print. 


584 THE SECONDARY SCHOOL 


Informal objective testing in industrial arts 


Objective item forms useful in the field of industrial arts are varied 
in type. The rearrangement item form, illustrated above, is well 
adapted to the measurement of knowledge concerning skills or tech- 
niques. Identification exercises, in which the pupil is to identify 
pictures or actual samples, such as types of woods or tools, by a 
simple recall or a matching item form are also useful. An example 
of such a matching exercise appears on page 203 of this volume. The 
two following illustrations show adaptations of sentence completion 
and multiple-choice items which are useful paper-and-pencil tech- 
niques in the measurement of industrial arts information.* 


4. Directions: Complete the following sentences by inserting one of 
the words found in the list on the right of the page. The words are to 
be used only once. 


I. А cabinet scraper is used to ....... asurface. т. Squareness 
2. A try-square is used for ....:5....... 2. Smooth 
duc Aw ATE is used for cutting lengthwise of 3. Mill 
the grain. 4. Ripsaw 
PUE AD CLER file is used to shape the edge of 


a cabinet scraper. 


7. Directions: Underline all the words in each pair of parentheses 
which will make true statements. 


are 


1. (Cypress, pine, redwood, elm) is 


used in making water tanks. 


are 


2. (Maple, walnut, fir, yellow pine) is 


soft wood. 


Aptitude and intelligence tests 


Individuals enter wage-earning careers with a great variety of 
educational preparation, with varying degrees of general ability and 
special abilities, all of which will, to a large degree, determine their 
success in a particular occupation. It is important, therefore, that 
the school be supplied with specific information concerning the 
degree of general education, the amount of general ability, and the 
nature of the special abilities demanded by the various wage-earning 


5 Louis V. Newkirk, and Harry A. Greene, Tests and Measurements in Industrial 
Education. John Wiley and Sons, Inc., New York, 1035 p. 19-10. 
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activities that the young worker may enter upon leaving school. In 
a similar way the school is concerned with determining whether or 
not the high-school pupil possesses (1) an adequate body. of occupa- 
tional information, (2) a sufficient background of general education, 
(3) an adequate degree of general intelligence, (4). the special in- 
telligence, and (5) the specific skills needed to succeed in these same 
jobs. Educational records will reveal the educational achievement of 
pupils; exploratory courses will indicate their interests in specific 
occupational fields ; intelligence tests should be used to suggest the 
amount of training it is practical for a pupil to take; aptitude tests 
will show whether or not an individual possesses the fundamental 
qualities without which he would be handicapped in a certain occu- 
pation; and vocational or trade tests will indicate whether or not a 
pupil possesses the specific skills required for proficiency in a par- 
ticular occupation. 

Both general intelligence and special aptitude tests attempt to 
measure capacity. By means of the former, it is possible to say that an 
individual has the intelligence required for success in any one ofa 
considerable group of occupations. By means of the latter, it is 
possible to determine to what extent he possesses the special qualities 
that make it possible for an individual to do successfully and without 
undue difficulty the skills that characterize an occupation. The 
specific problems of using intelligence tests and aptitude tests for 
general guidance purposes are discussed in Chapter ro of this volume. 


Mechanical aptitude tests 


(Mechanical aptitude is the special capacity of the individual to 
deal successfully with mechanical devices and to acquire knowledge 
essential to their selection and operation after suitable training has 
been given. Individuals with large measures of mechanical aptitude, 
other things being equal, readily respond to instruction along mechan- 
ical lines. On the other hand, individuals with low mechanical abili- 
ties are likely to react slowly regardless of the quality of instruction 
or the opportunities they are given to work with mechanical devices. | 

It is estimated that at least 40 per cent of the gainfully employed 
population in the United States is dependent jn some measure for 
its economic success on the possession of mechanical ability. It thus 
becomes apparent that a knowledge of the pupil's mechanical ability 
is important to the teacher from both the guidance and the instruc- 
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tional point of view. Knowledge of the fact that an individual has low 
or high mechanical aptitude provides an objective basis for guiding 
him into or out of vocations that involve high degrees of these 
abilities. Such knowledge is of particular value to trade and con- 
tinuation school teachers in selecting individuals who are likely to 
profit by the instruction offered. However, it is well to bear in mind 
that mechanical ability tests, like all other tests, must be carefully 
administered and interpreted, and that at best they are suggestive 
but not infallible. 

It is widely known that the mechanical ability of different in- 
dividuals varies widely. It is also known that mechanical ability does 
not correlate highly with intelligence of the abstract type. The usual 
relationship is around +.40. This does not mean that individuals 
with high intelligence as measured by general intelligence tests do 
not in many cases have high mechanical ability, nor does it mean that 
individuals with low intelligence always have high mechanical ability. / 
It strongly suggests that there may readily be a concrete aspect of 
intelligence that is not necessarily an accompaniment of intelligence 
of the abstract type. 

Only one series of the several available tests of mechanical ability 
will be mentioned here. These are the Minnesota Mechanical Ability 
Tests, which are the outcome of an extensive program of research 
at the University of Minnesota. The tests, which cannot be illustrated 
or described here because of their complexity, include the (1) 
Minnesota Paper Form Board Test, (2) Minnesota Spatial Relations 
Test, (3) Minnesota Assembly Test, (4) Minnesota Interest Analysis 
Test, (5) Minnesota Packing Blocks Test, and (6) Minnesota Card 
Sorting Test. 

Among other well-known tests useful for forecasting mechanical 
performance in school and industry are the Stenquist Assembling 
Tests, the Stenquist Mechanical Aptitude Tests, the MacQuarrie 
Test of Mechanical Ability, the Detroit Mechanical Aptitude Ex- 
aminations, and the O’Rourke Mechanical Aptitude Test. 


Vocational and trade tests 


Intelligence tests and special aptitude tests are primarily prog- 
nostic in their function. Their purpose is to aid in the suitable choice 
of and preparation for specific fields of endeavor. No training for 
or experience in any occupation is assumed before the pupil is sub- 
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jected to a vocational aptitude test. Vocational and trade tests differ 
from these tests in that they are designed to determine an individual's 
ability to perform the work of a specific occupation. Training or 
experience in the occupation are assumed, and the test is designed to 
measure the results of these or to find out what stage of progress the 
individual has achieved. This type of test is valuable as an aid in 
determining whether an individual should be recommended for a 
particular occupation requiring known standards of proficiency. 
This kind of performance test has the advantage of being typical 
of occupational situations. 

The demand for vocational education under school conditions has 
grown rapidly in the past several decades. There seems to be no 
reason for expecting this demand to diminish in the years imme- 
diately ahead. Rightly organized, vocational education will prove a 
profitable investment for society and will contribute to the making 
of the citizen as well as the worker. In the course of the development 
of a progressive social economy, it may become desirable for every 
individual to acquire a certain amount of vocational education. In 
such a movement there will be an increasing need for objective means 
for evaluating instructional content and for evaluating the results 
of its presentation. The construction, refinement, and use of tests in 
this field will therefore be one of the most significant challenges to 
forward-looking workers in industrial education. 

Trade tests were widely used during World War I and recently 
came into prominence again in World War II for the purpose of 
classifying army recruits according to the services they could best 
render. These tests were designed to measure each recruit's (1) 
ability to do, and (2) capacity for preparing to do. These tests 
measured in each recruit the degree of trade ability already pos- 
sessed as a result of knowledge of, or experience in, a trade. These 
army tests assisted in dividing all recruits into four grades of work- 
ers: (1) novices, who possessed practically no trade knowledge or 
skill, (2) apprentices, who showed a fair degree of proficiency in the 
trade, (3) journeymen, who could be relied upon to perform trade 
jobs satisfactorily, and (4) expert workers, who could be expected to 
meet any emergency in the field. 

Standardized trade tests have not been widely used. However, 
trade unions have developed a few tests for use in many phases of 
trade work. These tests are used in some vocational and trade schools. 
The development of standardized tests of occupational proficiency 
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suitable for use in rating those who pursue any kind of vocational 
preparatory training in public schools seems entirely practical, even 
though little progress in this direction has yet been made. The reason 
for the lack of trade-test development in the public schools is two- 
fold: (1) vocational education, as viewed by most educators, is not 
aimed to develop skilled tradesmen; (2) the need for standardization 
and objectivity of measurement has not generally been regarded as 
serious. 


3 MEASUREMENT IN HOME ECONOMICS 


Measurable factors in home economics 


In the home economics field the following factors affecting or 
revealing degrees of achievement are at present measurable. These 
factors vary from specific, definite, and highly specialized skills to 
broad phases of appreciation. 


I. Specific skills, such as skill in certain phases of sewing, cooking, 
decorating, drawing 

2. Knowledge of important phases of home economics, such as knowl- 
edge of foods or household activities 

3. General information, in the matter of foods, clothing, household 
activities 

4. Aesthetic appreciation 

5. Artistic judgment 

6. Essential knowledge in field of homemaking 


Standardized testing in home economics 


Prior to тозо, standardized tests in home economics were almost 
entirely factual tests. Since that date, accompanying the increasing 
recognition of the fact that changes in behavior and attitudes must 
be measured, tests and scales have appeared in this field for the 
measurement of various abilities, skills, attitudes, and appreciations. 
Among such instruments are scales for measuring sewing ability, 
quality of foods prepared by the pupil, and attitudes toward home- 
making activities, as well as tests in the areas of house design and 
furnishing, foods, and household management. Brief descriptions of 
several tests and scales and one illustration are given below for the 
three major phases of instruction: (1) foods and their preparation, 
(2) textiles, and (3) household management. 
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Foods. The Illinois Foods Test, prepared by the test committee 
of the Illinois Home Economics Association, measures factual knowl- 
edges in fourteen important aspects of foods, their preparation and 
sanitation, and the etiquette of food service. The Streeter-Trilling 
Food Preparation Test is intended to measure both judgment and 
informational aspects of the subject, but items of the latter type 
predominate. The King-Clark Foods Test measures the knowledges 
and abilities that should result from the study of the typical course 
in foods and cooking. 

Two instruments worthy of note are the Minnesota Check List for 
Food Preparation and Serving and the Brown Food Score Cards. The 
check list provides a five-point scale for the rating of r4 aspects of 
food preparation and serving by the teacher after having observed 
individual pupils or small groups of pupils in situations involving 
the opportunity to use the desirable techniques. The Brown Food 
Score Cards, available for 53 items of food commonly prepared in 
the home economics laboratory, provide variously for different foods 
for ratings on such characteristics as appearance, color, consistency, 
flavor, lightness, moisture content, size, taste, tenderness, and tex- 
ture, For example, bacon is rated on only four points, whereas nine 
of the ten characteristics are used in the rating of baking-powder 
biscuit. An illustration of the score card for waffles is shown on page 
209 of this volume. 

Textiles. The Frear-Coxe Clothing Test measures the knowledge 
of clothing possessed by girls who have completed courses in ele- 
mentary dressmaking in these five special areas: (1) fundamentals 
of construction, (2) care and repair of clothing, (3) hygiene of cloth- 
ing, (4) appropriateness of clothing, and (5) economics of clothing. 
The Murdock Sewing Scale provides samples of 15 types of stitching 
ranging in merit from zero ability to expert ability. The teacher uses 
the scales in rating the pupils’ products in such sewing skills as 
hemming, basting, running stitch, backstitch, overcasting, and com- 
bination stitch. 

The Stevenson-Trilling Tests in Comprehension of Patterns have 
five parts dealing with recognition of parts of a pattern, compre- 
hension of pattern lines, comprehension of notches, alteration of 
patterns, and placing the pattern on the material. The accom- 
panying illustration of an item from the third part of the tests is 
illustrative of the subject-matter content and of the testing pro- 


cedure used. 
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Excerpt from Stevenson-Trilling Tests in Comprehension of Patterns * 


2. This is a pattern of a sleeve and the cuff. Draw a cross on the 
line of the sleeve and the line of the cuff which are to be sewed together. 


E] 


Household management. 'This phase of home economics ability 
appears to be measured almost entirely in conjunction with certain 
other skills and abilities in the home economics fields. Therefore, 
mention of tests of household management is included in the follow- 
ing discussion of general tests in home economics. 

General tests in home economics. The Engle-Stenquist Home Eco- 
nomics Test provides separate tests in foods and cookery, clothing 
and textiles, and household management. The tests measure knowl- 
edge necessary for healthful living, as well as knowledge of the prin- 
ciples underlying certain skills in foods, clothing, and management 
of the household. The Unit Scales of Attainment in Foods and House- 
hold Management make use of multiple-choice and occasional simple 
recall items in measuring the factual knowledges necessary for eff- 
cient performance in these two areas of home economics. 

The Minnesota House Design and House Furnishing Test meas- 
ures the ability of the pupil to judge and discriminate between good 
and poor design in house and house furnishing samples presented to 
them in the form of photographs. The instrument is intended for the 
measurement of the degree to which study of related art results in 
improved taste on the part of pupils. 


Informal objective testing in home economics 


The use of the type of informal objective testing technique recom- 
mended by Tyler, as discussed on page 162 of this volume, is fairly 


6 L. Stevenson and M. Trilling, Tests in Comprehension of Patterns. Published by 
Public School Publishing Co. 1927. " 
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recent in the field of home economics. However, Arny * presented 
numerous and varied samples of objective item groups in the field 
of home economics. Samples of test units are given to illustrate the 
measurement of: 


i. The factual information that an individual possesses 

2. Knowledge plus some understanding of the factors and relationships 
that are pertinent in a particular situation 

3. The ability to apply facts and information...to the solution of 
problems... 

4. The actual ability to do—to identify fabrics, to set a table, or to 
select a garment 

5. Taste, appreciation, and attitudes 


It is feasible to present only two groups of sample items here to 
illustrate possible pencil-and-paper testing techniques in home eco- 
nomics. Illustrations of performance tests in clothing appear on pages 
203-4 and 212-13 of this volume. The first sample is of a group of 
items intended for the measurement of factual knowledges plus 
understanding, and is chosen from the foods portion of the home 
economics curriculum. 


SawPLE or Test Exercise IN Foops* 
To which food group does each of the following descriptions apply? 


Food Groups 

5. Potatoes and sweet potatoes 
Fruits—citrus 6. Vegetables—green and yellow 
Fruits—other 7. Vegetables—other 
Milk and cheese 8. Sugars and sweets 


Cereals 


4 со ы M 


Descriptions of Characteristics 


т. Is the cheapest source of Vitamin A. 
2. Is an inexpensive source of energy and protein. 
Provides the highest percentage of riboflavin. 


— . 4. Is the poorest source of vitamins. 
American people with the largest amount of 


— 5: Supplies the 
ascorbic acid. 


т Clara В. Arny, Evaluation in Home Economics. Appleton-Century-Crofts, Inc., 


New York, 1953. 
5 Ibid. p. 115-16. 
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The second sample, from the field of textiles, illustrates a method 


of testing the ability to do. 


SAMPLE or Test EXERCISE IN TEXTILES ° 


Directions: In Envelope X are то fabrics (labeled A-J). In the blank 


in front of-each:description write the letter corresponding to the fabric 
to which it applies. More than one description may apply to the same 
fabric. 


Descriptions 
1. May be combined with sample B to form a monochromatic 
color harmony. 


2. May be used as a belt on a dress made of sample G to form 
an analogous harmony. 


—— 3. May be used with sample A to form a striking complementary 


harmony. 


———4. May be used with sample D to form a subdued but pleasing 


complementary harmony. 


—— 5. May be combined with sample C to make an analogous har- 


mony which would be more becoming to a person with warm 
coloring than with cool coloring. 


Topics for Discussion 


I. 


In what specific ways are the demands made on industrial education 
today different from those of fifteen years ago? Why? 

What in your opinion will be the effect of further mechanization of 
industry on the objectives of instruction in the industrial arts as 
stated in this chapter? 

Expand the list of measurable factors either in industrial arts or in 
home economics. 

What are the arguments favoring the inclusion of visual art tests as 
important measures of industrial arts outcomes? 

What do you consider to be the greatest weaknesses of the tests 
either in industrial arts or home economics? 

What future do you see for vocational aptitude tests? Defend your 
position. 

What are the differences between trade tests and vocational apti- 
tude tests? 

Why are local occupational surveys of great importance for cur- 
ricular and guidance purposes? 


9 Ibid. р. 63. 
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2.3 


Measuring and Evaluating in 
Business Education 


Tur FOLLOWING aspects of business education are presented in this 
chapter: 

A. Vocational and personal-use aims of business education. 
Standardized achievement tests in content areas. 
Standardized achievement tests in skill areas. 

Informal objective testing in business education. 
Measurement of interests in business education. 
Predictive tests in business education and industry. 


Business education is a field in which the emphases have changed 
materially during the last several decades. Originally planned only 
for pupils wishing training in office techniques for vocational use, the 
curriculum of business education during the early years of the cen- 
tury consisted largely of such courses as typewriting, shorthand, 
bookkeeping, and penmanship. The first three of these continue today 
to be the courses most commonly offered, but they have been supple- 


mented by a wide variety of new courses. 


B B B gU 


1 AIMS AND OBJECTIVES IN BUSINESS EDUCATION 


Major purposes of business education 


business education as outlined by Price and 


The major aspects of ; 
ducation, with em- 


his colleagues + consist of: (1) basic business e 
1 Ray б. Price and others, “Business Education.” Encyclopedia of Educational 
Research, Revised edition. Macmillan Co., New York, 1950. p. 115-26. 
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phasis on consumer efficiency, (2) technical business education, 
stressing training for jobs, and (3) distributive education, concerned 
primarily with the selling of goods and services. Consumer problems 
have received direct attention in business education only during the 
last fifteen to twenty years, and distributive education has been 
increasingly emphasized of late. Programs of technical business edu- 
cation have been modified least. Even here, however, the traditional 
vocational objectives have been supplemented by personal-use values, 
particularly in typewriting, as the number and popularity of courses 
in personal typewriting increase. 

Basic business education courses are best represented by such non- 
technical subjects as economic geography, consumer problems, book- 
keeping, business law, and junior business training. Technical busi- 
ness education includes courses in typewriting, shorthand and tran- 
scription, general clerical training, and business arithmetic, but here 
the personal-use as well as the technical values receive direct atten- 
tion. The variety of courses in distributive education is great, but 
merchandise information, salesmanship, retailing, store methods and 
systems, and occupational relations probably are the most widely 
offered. 

Adjustment of the individual to his business environment is a 
primary aim of business education. Involved in this are the two fol- 
lowing types of major aims, the first of which applies to non-technical 
needs and the second of which applies to vocational needs.? 


1. Training in those phases of business that concern every member of 
organized society 


a. Education of persons to be intelligent consumers of the services 
of business 


b. Education of persons to a clear understanding of the nation’s 
economy 


2. Training in technical or specific vocational skills and abilities 


a. Training in specific job skills 


b. Development of ability to use these skills in the environment 
of business 


2 Herbert A. Tonne, Estelle L. Popham, and M. Herbert Freeman, Methods of 
Teaching Business Subjects. Gregg Publishing Division, McGraw-Hill Book Co., 
Inc. New York, 1949. p. 8. 
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So rapidly has the shift of emphasis taken place recently that 
the first of these major aims, the provision of business education for 
the types of business activities met by all persons, has probably come 
to be the dominant purpose in business education at the secondary- 
school level. 


2 STANDARDIZED ACHIEVEMENT TESTS IN BUSINESS 
EDUCATION 


Problems of measurement in business education 


Course offerings common to business curricula may be classified 
into content subjects and skill subjects. The content subjects, such 
as business English and law, economic geography, general business, 
and junior business training, place major emphasis upon the devel- 
opment of knowledges, attitudes, and abilities to apply information 
to practical situations. On the other hand, subjects such as book- 
keeping, typewriting, shorthand, business arithmetic, machine cal- 
culation, and office practice, in which skills receive the major em- 
phasis, subordinate the necessary basic knowledges to the ultimately 
important skills. 

Tests in the content subjects differ little in type from those that 
have been discussed and illustrated in Chapters 18 and 20 of this 
volume. Moreover, there appear to be very few standardized achieve- 
ment tests available for the content subjects of business education.* 
Therefore, only: passing attention will be given here to testing meth- 
ods in the content subjects in this field. Tests for the measurement of 
business skills are predominantly of the performance type, and it is 
of great importance that they closely approximate conditions under 
which the skills will be used in an office or in the personal life of the 
pupil. It is evident, therefore, that the measurement problems in 
business education are of two rather distinct types. In the discussions 
and illustrations of testing techniques in the following sections of 
this chapter, relatively more attention will therefore be given to the 
skill subjects than to the content subjects. 

No attempt is made to distinguish tests f 
-standardized tests in content areas see, Mathilde 


Tests and Measurements in Business Education, 
Cincinnati, Ohio, 1952. Chapter 5. 


ог vocational objectives 


5 For a variety of objective, non 
Hardaway and Thomas B. Maier, Tests 
Second edition. South-Western Publishing Co., 
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from tests for personal use objectives in the discussions and illus- 
trations that follow. So far as the authors know, there are no stand- 
ardized tests designed primarily for personal use objectives. The 
distinction is perhaps relatively unimportant, furthermore, because 
of the fact that sufficient overlaps in course content and method 
probably exist in several subjects to permit the use of one test for 
the measurement of both types of desired outcomes. 


Standardized tests in content areas 


Testing in business English differs from testing in other English 
courses mainly in the type of emphasis required. The content of tests 
in this field is keyed to the needs of business rather than to the more 
general needs of high-school pupils as a whole. Economic geography 
tests represent a specialized treatment of geography content from the 
standpoint of its significance in economics and commerce. Tests in 
such subjects as business law, general business, and junior business 
training include content material of a type the pupil seldom has en- 
countered previously in his school career, but the emphases in all 
tests in this area are primarily upon knowledges, attitudes, and 
abilities to apply the results of learning. 

Tests in commercial and business law are often devised to accom- 
pany particular textbooks or for use in commercially-sponsored or 
state scholarship contests. Consequently, there are few standardized 
tests in this area. The Tate Economic Geography Test is the only 
standardized test in its field that has come to the attention of the 
authors. Thus far there are no standardized tests for use solely in 
measuring ability in business English. Such tests as seem to be 
adapted for the evaluation of business English appear as subtests 
in more general instruments. 

A few standardized tests are available for general business and 
junior business training. Among these are the Thompson Business 
Practice Test, which is intended for use in such courses as junior 
business training, introduction to business, elementary business train- 
ing, and everyday business, and the Smith General Business Training 
Test. 

The accompanying illustrations of true-false, multiple-choice, 
matching, and completion items from three of the tests named above 
show the basic item forms used in measuring abilities in the content 
subjects of the business curriculum. 
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Excerpt from Parke Commercial Law Test * 
( ) 24. Treasury stock is stock that has never been issued. 
( ) 25. In a bailment there is delivery of both title and possession. 


( ) 26. An ordinary deposit of money in a bank, subject to check, 
is a bailment. 


Excerpt from Tate Economic Geography Test ° 

( ) 66. The leading city for jute manufacturing is т. Dundee, Scot- 
land. 2. Calcutta, India. 3. Boston, Massachusetts. 4. 
Montreal, Canada. 

( ) 67. The world’s chief diamond cutting centers are Antwerp and: 
i. New York. 2. Berlin. 3. Amsterdam. 4. Johannes- 
burg. 

( ) 68. An unusual industry of the Netherlands is: т. dairying. 2. 
bulb culture. 3. perfume making. 4. hat manufacturing. 


Excerpts from Thompson Business Practice Test ° 


Part II 
т. Consignor т. An allowance made to dealers ........ (ee) 
2. Consumer 
з. Job lot 2. One who sells goods ...............: (anya 
E Рр з. One who uses goods ................ (ima 
5. Retail 
6. Trade discount 4, To trade in large quantities ......... (GF 
7. Vender 
8. Wholesale 5. An odd assortment of merchandise ....( )° 
Part IV 
6. When a registered letter containing money is lost in 
the mail, the loss is borne by Ше—............ (apie) 
7. All telephone calls in one city or through one ex- 
change are known aS— ........ 5° (uias дыы) 
8. The money of the United States may be divided into 
two classes, metallic and— ......:......----- (uU ERECTA 


4L. А. Parke, Commercial Law Test. Published by Bureau of Educational Meas- 


urements, Kansas State Teachers College, Emporia, 1933. 

5 Donald J. Tate, Tate Economic Geography Test. Published by Bureau of Edu- 
cational Measurements, Kansas State Teachers College, Emporia, 1940. 

6 James M. Thompson, Thompson Business Practice Test, Published by World 


Book Co., 1937. 
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Standardized tests in stenography 


Most of the standardized achievement tests in stenography are 
limited to the measurement of ability to take dictation, to read 
shorthand, and to transcribe shorthand notes. In too few cases have 
such important stenographic abilities of a non-shorthand type as 
filing, telephoning, and meeting callers been considered. 

Among the well-known tests of stenography are the Hiett Stenog- 


spectively for use with the Gregg Shorthand Manual, Anniversary 


Education Survey Tests in Junior and Senior Shorthand, the Turse- 
Durost Shorthand Achievement Test, and the Rollinson Diagnostic 
Shorthand Tests. The latter consist of twelve separate tests, each 
based on a chapter of the Gregg Shorthand Manual, Anniversary 
edition. Each test measures vocabulary, accuracy and speed of 
handwriting, taking dictation, word meaning, and rate and com- 
prehension of reading. Teachers’ charts for diagnostic use are 
provided, and a sheet to be torn from each pupil’s test booklet and 
used by the teacher in indicating pupil errors serves as an excellent 
tool for individual diagnosis, Examples of the testing techniques used 
in the word meaning and reading comprehension tests are given in the 
accompanying illustration for the tests on Units т to з and Units 19 
to 21. 


Excerpts from Rollinson Diagnostic Shorthand Tests 7 


READING—Worp MEANING 


Directions: Read all the shorthand. The last word of each sentence 
is left to your choice—you may choose one of the four words set in type. 


margin. Accuracy of reading, as shown by your choices, and the quantity 
completed will be considered in the score. 


Mustration: —,— (1) ready (2) lamb (3) raid (4) lad. ..4... 
ы ЖК a RE О (1) oat (2) office (3) up (4) under? eri 


BOP «с “ке (1) evident (2) fortune (3) for (4) avert, 


7 Ethel A. Rollinson Rollinson Diagnostic Shorthand Test i 
Publishing Co., inb , t ests. Published by Gregg 
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READING—COMPREHENSION 


Directions: Read all the shorthand carefully. An occasional outline 
makes no reading sense in the place used. When you find such an out- 
line, cross it out. Your accuracy in locating all the unnecessary outlines 
will be considered in the score. 


Illustration: o— \ Dr d A AD 
ы. —o = é s —e —/ EMEN 


Two work-sample tests in stenography have appeared rather re- 
cently—the Seashore-Bennett Stenographic Proficiency Tests * and 
the SRA Dictation Skills test.” The designation as work-sample tests 
is justified by the simulation of a practical office situation provided 
by the testing procedures. Both tests employ phonograph recordings 
of letters and both require the pupil to take shorthand dictation as 
the records are played. The Seashore-Bennett test then requires the 
pupil to transcribe his notes and provides for a semi-subjective scor- 
ing of the resulting letters on six aspects of a “mailability” criterion. 
The SRA test does not require transcription but rather tests the pupil 
on speed and accuracy in terms of his ability to supply the correct 
words to fill the blanks in letters printed in test booklets. 


Standardized tests in typewriting 


Testing practices in typewriting seem to be dominated quite largely 
by the types of tests set up by typewriter manufacturers in the con- 
tests they have sponsored. Tests based solely on speed and accuracy 
over standardized material, as measured by a combination of credits 
for strokes per time unit and weighted deductions for errors, are 
therefore in the majority. Such tests seldom require use of the 
numbers and symbols keys of the typewriter, but select material for 
which mainly the letters are used. The present trend, not yet much 
evidenced in standardized tests, is toward the broadening of tested 
skills to include abilities in placing letters on a page, in use of the 


8 Published by Psychological Corporation, 1946. 
9 Published by Science Research Associates, 1947. 
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tabulation keys, and in typing rough drafts, and also toward a mean- 
ingful method of penalizing for errors in terms of their importance 
and correctibility. 


if 
uh 


Standardized tests in bookkeeping 


Bookkeeping tests are almost entirely devised for measuring vo- 
cational objectives, although, very properly, bookkeeping for per- 
sonal use is taught in some high schools. A large portion of the tests 
in bookkeeping, furthermore, are designed to accompany particular 


textbooks and consequently are neither adequately validated nor 


standardized for general use. 

The Elwell-Fowlkes Bookkeeping Tests consist of nine subtests 
and measure such knowledges and skills as general theory, journal- 
izing, adjusting entries, closing the ledger, and preparing statements. 


The two tests, one for use at the end of each semester of a one-year _ 


course, have considerable diagnostic value. Only one illustration from 
a bookkeeping test is given here, for many of the significant testing 
techniques cannot be pictured adequately in restricted space. The 
accompanying illustration from the Elwell-Fowlkes Bookkeeping 


Tests shows a testing technique that involves the indication by pupils 4 
of bookkeeping operations to be performed in the classification of. 


accounts rather than the actual manipulation of figures in bookkeep- 
ing operations. 


Excerpt from Elwell-Fowlkes Bookkeeping Tests, Test 2 1° 


Drrecrions. Indicate by a check mark (V) whether the balances of the 
following accounts taken from the ledger of G. H. Burch would appear 
as debits or as credits. Show by entering in the column headed “Classi- 
fication” the correct classification numbers from those given below. 
Asset 

Deduction from Asset 

Liability 

Proprietary Interest (Proprietorship) 

Income 

Deduction from Income 

Expense 

Deduction from Expense 


ON Фол Б о NH 


10 Fayette Н. Elwell and John G, Fowlkes, Elwell-Fowlk keeping Tests, 
Test 2. Published by World Book Co., 1929. ) р енн 
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CLASSI- 

Desir CREDIT FICATION 

SAMPLE, ? Cash"! ЭЖИ ШНА Н ТЛ И MIDI ҮҮ" (UTC) 
т. Merchandise Purchases ................» nr QT C E) 
2. Accounts Recelvable КД sin EE СОТ) 
з. Salaries 5 УЧАЛЫ О dui. у) (=) 
4. Unexpired Insurance»... dee eee eee Ae Chen) 
5. Accrued Expenses c Se ADEL DEL ide on C a CUN) 


Another set of diagnostic tests is the Breidenbaugh Bookkeeping 
Tests. These cover journalizing, adjustments, balance sheet, state- 
ment of profit and loss, closing entries, and worksheet. Four tests are 
provided for a one-year course, and the final one of the four can be 
used as a prognostic test for advanced bookkeeping courses. 

The Shemwell-Whitcraft Bookkeeping Tests are similar to the 
tests previously mentioned in that they make separate provision for 
each semester of a two-semester course. Such abilities and skills as 
journalizing, making adjusting and closing entries, classifying ac- 
counts, defining important business terms, and solving practical 
bookkeeping problems are tested. These instruments are survey 
rather than diagnostic in type. 


Standardized tests in commercial arithmetic 


The Kinney Scales of Problems in Commercial Arithmetic are 
among the few standardized instruments in this field known to the 
authors at this time. These tests are based on problem situations 
collected from merchants, businessmen, and teachers. There are 
sections on each of the four fundamental processes in arithmetic, on 
aliquot parts, and on problems of various types arranged in four 
scales for use at specified times during a two-semester course. The 
scales have considerable diagnostic significance, and provide a 
tabulation form on the back of each booklet for the pupil’s use in 
analyzing his weaknesses. The Gilbert Business Arithmetic Test is 
the only other standardized test in this field known to the authors. 


3 INFORMAL OBJECTIVE TESTING IN BUSINESS EDUCATION 


Although, the educational literature on informal objective testing 
in business education has until recently not been extensive, the at- 
tention of business educators has turned definitely toward measure- 
ment techniques during the last few years. A number of articles on 
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test construction in distributive education and technical business 
education are listed at the end of this chapter. The recommendations 
given below for informal objective testing methods in several busi- 
ness education courses supplement the preceding sections of this 
chapter in furnishing suggestions to the student for practicable test- 
ing methods. 

Carlson suggested a procedure for testing bookkeeping skills that 
eliminates the subjectivity and tedium involved in scoring bookkeep- 
ing sets step by step, and that distinguishes between the training 
function of bookkéeping sets and the testing function of bookkeep- 
ing tests." He recommended that informal objective items Бе con- 
Structed and used periodically during the course. The accuracy of 
the pupil's work and his understanding of the set of books he is 
keeping can be measured by keying objective items to the set in such 
manner that the pupil obtains the answers by actual reference to his 
work in the books. Not only is the accuracy of the pupil's work tested 
by his answers to the objective items but his ability to use the set 
with understanding is tested by his ability to locate the answers. 

Suggestions were also offered by Carlson for the use of problem- 
type tests in bookkeeping and business arithmetic that eliminate the 
all-or-none characteristic of lengthy problems scored on the sole 
basis of the final result.12 He recommended, for example, that à 
favorite form of bookkeeping test, based upon the use of figures and 
actual computations on a working sheet, balance sheet, and statement 
of profit and loss, be replaced by a test of ability to place check 
marks in the proper places on a similar set of forms to indicate where 
the figures should be filled in. Such a test would embody the checking 
technique pictured in a preceding section of this chapter for the 
Elwell-Fowlkes Bookkeeping Tests, although it would differ greatly 
in form and complexity from the sample pictured. Carlson pointed 
out that mere accuracy of addition and subtraction are requirements 
for satisfactory completion of a bookkeeping problem if the pupil 
knows where the figures should be posted. This technique can be 
extended to the preparation of final financial reports, he pointed out, 


by asking the pupil to classify accounts rather than to prepare the 
complete statement. 


11 Paul A. Carlson, The Measurement of Business Education, South-Western Pub- 
lishing Company Monograph, No. 18. South-Western Publishing Co., Cincinnati, 
Ohio, 1932. p. 17. 

12 Ibid. p. 16. 
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4 MEASUREMENT OF INTERESTS IN BUSINESS EDUCATION 


One of the few standardized interests inventories that restricts 
itself to one area of education or occupation is found in the field of 
business education. This is the Cardall Primary Business Interests 
Test, which-is intended to locate the area of business activity in which 
the pupil is most likely to have satisfying experiences. The instru- 
ment, by the use of a differential scoring technique, produces scores 
indicating relative interests in the five areas of accounting, collec- 
tions and adjustments, sales-office work, sales-store work, and steno- 
graphic-filing work. This instrument, in common with interest 
inventories in general, should not be expected to furnish a very exact 
prediction of success, but the supplementary use of results from 
aptitude tests, achievement tests, and other measuring instruments, 
as well as personal data concerning the pupil's background and school 
progress, should afford useful guidance information. 


5 PREDICTIVE TESTS IN BUSINESS EDUCATION 


Types of predictive tests 


"Tests for prediction of future success in the fields of business edu- 
cation fall into two main groups. The first of these includes the prog- 
nostic and aptitude tests, primarily in the fields of shorthand and 
stenography, that are used in the guidance of pupils into or away from 
business education curricula. The second includes the National Busi- 
ness Entrance Tests and clerical aptitude tests for which a major 
function is the furnishing of information to prospective employers 
concerning the ability of applicants to perform certain types of office 
work. The first type of test presupposes little or no previous experi- 
ence with the business education subjects for which the tests are 
given, whereas the latter assume а certain amount of experience, 
either specific or general, with the types of abilities necessary for the 


form of office work in question. 


Prognostic and aptitude tests in shorthand 

Several tests are available for use in predicting pupil success in 
shorthand and stenographic ability. Of these, the Hoke Prognostic 
Test of Stenographic Ability is probably the best known. It has parts 
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measuring motor reactions, speed of writing, quality of writing, 
speed of reading, memory, spelling ability, and ability to use sym- 
bols. It furnishes a score intended to be predictive of success in 
shorthand courses in high school. 

. Two other tests of similar types are the Bennett Stenographic 
Aptitude Test and the Turse Shorthand Aptitude Test. The former 
includes a transcription test from numbers to symbols and back to 
numbers and a test in recognition of correct spellings of words. The 
latter consists of seven parts that test stroking, spelling, phonetic 
associations, symbol transcription, word discrimination, dictation, 
and word sense. Because of the unique method of measurement of 
transcription in these instruments, the accompanying illustration of 
the directions portion of the symbol transcription subtest from the 
Turse Shorthand Aptitude Test is presented here. 


Excerpt from Turse Shorthand Aptitude Test +° 


Directions. This is a test to determine how rapidly and how accurately 
you can decipher or transcribe the following shorthand sentences by using 
the alphabetic key which is provided below. Letters which are not pro- 
nounced аге ot written in the shorthand, but all missing letters must be 
supplied in every word in your answer. Study the sample given below. 
You will find that the word “mean” is written in shorthand “m-e-n” be- 
cause the “a” is not pronounced. Each sentence is complete and gram- 
matically correct. You will be allowed ten minutes. 
Key 
Letter: ravsethimnlofd 


Symbol: ^*(^oo-«|v))-—1 


Sample 1. о LA o k k 


MS SX TAG П а НУШ 


Clerical aptitudes tests 


Three tests that measure clerical aptitudes as a basis for classifi- 
cation and counseling purposes at the junior and senior high-school 
levels are the Detroit Clerical Aptitudes Examination, the Detroit 
General Aptitudes Examination, and the Thurstone Clerical Ability 


ал: L. Turse, Turse Shorthand Aptitude Test. Published by World Book Co. 
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Test. Making use of eight subtests, the Detroit Clerical Aptitudes 
Examination measures motor skills, visual imagery, and educational 
skills that are basic to performance in clerical work. The clerical 
aptitudes score obtained from the Detroit General Aptitudes Ex- 
amination makes use of nine parts, seven of which are common to 
the two tests. The accompanying sample item from the subtest in 
which the pupil is to arrange the elements of disarranged pictures in 
proper form is one of the visual imagery tests. The authors of the 
tests believe that visual imagery is involved in the manipulation of 
calculating machines, adding machines, and perhaps typewriters.** 


Excerpt from Detroit Clerical Aptitudes Examination *° 


Trade tests of clerical aptitude 


Tests of clerical aptitude are used most frequently as trade tests 
by business concerns attempting to select employees who are best 
fitted to do their work satisfactorily. The aim may be to measure 
either general clerical aptitude by tests that are closely similar to 
verbal intelligence tests or aptitude for specific types of perform- 
ances by tests that measure the unique traits thought to be essential 
for such skills as filing or stenography. ne 

The Chicago Test of Clerical Promise measures four abilities es- 
sential to success in office work—facility and accuracy 1n the use of 


and Alex C. Crockett, Teacher's Handbook 


14 Harry J. Baker, Paul He au Public School Publishing Co., 


for the Detroit General Aptitudes Examination. 

Bl i Ill. p. 19. i ; Lors 
2s Harry J. MT Paul H. Voelker, Detroit Clerical Aptitudes Examination. 

Published by Public School Publishing Со, 1938. ^ 
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English, memory, attention to detail, and ability to use number com- 
pounds: It does so by means of subtests on (1) accuracy in spelling, 
(2) simple arithmetic, (3) memory for oral instructions, (4) check- 
ing names and numbers, (s) vocabulary usage, (6) arithmetical 
reasoning, and (7) accuracy in copying. 

The O'Rourke Clerical Aptitude Test, Junior Grade, consists of 
two subtests on clerical problems and on reasoning problems. The 
various parts measure abilities of several clerical types, information 
and factual knowledge important in office work, and reasoning 
ability with materials chosen from fields of business. 

Two other tests serving the same general purposes are the Minne- 
sota Vocational Test for Clerical Workers and the Minnesota Rate of 
Manipulation Test. The first consists of pairs of numbers and pairs 
of names. The pupil is expected to compare the items of each pair 
and to check those pairs for which the items are identical in every 
respect. The second test is of the performance type, and requires the 
pupil to change the positions of cylindrical blocks in a form board in 
two designated ways. This instrument is used in the selection of 
both office workers and shop workers. 


Salesmanship aptitude test 


A test for Ability To Sell, one of the George Washington University 
Series, measures characteristics important to store salesmanship. The 
six parts of the test measure judgment in selling situations, memory 
for names and faces, observation of behavior, ability to learn selling 
points in merchandise, ability to follow store directions, and ability 
to solve selling problems. 


National Business Entrance Tests 


The field of business education is unique among secondary-school 
areas of instruction in having developed for some of its important 
measurement needs a cooperative, comprehensive series of objective 
tests. The National Clerical Abilities Tests were sponsored by the 
Eastern Commercial Teachers Association when they originated in 
1938. They later became a joint enterprise of the National Council 
for Business Education and the New England Office Management 
Association. The present tests, known as the National Business En- 
trance Tests, are jointly sponsored by the National Office Manage- 
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ment Association and the United Business Education Association. 
One of the three present forms of the battery is available to schools 
for general testing. The other two series, for use only by employers 
or in official test centers, will not be discussed here, although they are 
similar in many respects to the form available for school use. 

'The aims and philosophy of the National Business Entrance Tests 
were stated as follows: 


They have been especially designed for use in hiring office workers. 
The tests measure the applicant's skills necessary to successfully handle 
a beginner's work in one or more of the common types of office ac- 
tivity ... To test the applicant's skill in handing a beginner's job, the 
most valid type of test is one that simulates the work conditions that 
would naturally confront the individual. The National Business En- 
trance Tests . . . are excellent determiners of sustained skill in the various 
fields they cover.*® 


The Long Form General Testing and Screening Series consists of 
six tests. The one on fundamentals and general information is given 
to all candidates, but the individual pupil chooses anywhere from 
one to all five of the skills tests for bookkeeper, typist, stenographer, 
calculating machine operator, and general office clerk. Each test 
requires approximately two hours of time. They are easy to admin- 
ister and score. 


Topics for Discussion 


т. What are the two major functions served by business education in 


the modern school? к . 
2. Distinguish between the content and skill subjects of business edu- 


cation and give examples of each. "n 
3. What is the status of standardized testing in the content areas of 


business education? р ] 
4. Indicate some of the shortcomings of most standardized tests in 


shorthand and typewriting. i 1 
s. Discuss the value of measuring interests in the field of business edu- 


cation. T Е ' 
6. What are the two main types of predictive tests in business educa- 


tion? Illustrate each type by naming and describing a standardized 
test. 


ie T, W. Kling, “National Business Entrance Tests.” Balance Sheet, 32:96; 
October 1950. 
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7. Discuss the nature and purposes of the National Business Entrance 
Tests. In what way are they unique? 

8. Describe some recommended informal objective testing procedures 
for bookkeeping and business arithmetic. 
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Measuring and Evaluating 
in Health and Physical Education 


THIS CHAPTER presents a brief summary of the following aspects of 
measurement and remediation in health and physical education : 


A. Status and aims of health education. 

в. Measurement and evaluation in health education. 

c. Diagnostic and preventive measures for the improvement of 
health. 

». Philosophy and objectives of physical education. 

к. Measurement in physical education. 

r. Diagnosis of physical condition. 


The rather closely related fields of health and physical education 
are exceedingly important in the school, although they seem not to 
be as much favored in the curricular setup in most schools as are 
the academic areas. The national and economic importance of good 
health is obvious, but the loss to society from illness and unneces- 
sary death is not so apparent as the loss to the individual. Physical 
education perhaps occupies a more important place in the society 
of today than was true earlier in the history of man because of the 
modern need for physical activities to counteract the effects of the 
physically inactive lives led by many persons. 


] SCOPE AND AIMS OF HEALTH EDUCATION 


Considered in its broadest sense, health education includes much 
more than can be treated in this chapter. The mental health, or 
mental hygiene, aspect is considered in Chapter rr. Some of the 

613 
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health education activities, such as health service, mental hygiene 
of the classroom, and recreation, are not measurement problems as 
much as they are supervisory or administrative problems. This chap- 
ter deals primarily with health education and physical education 
measurement, with some attention to the aims and objectives of 
these fields. Although they are in the main treated separately here, 
health and physical education are not equally inclusive terms, In- 
stead, the latter may be considered as one aspect of the former. 


Scope of health education 
Strang stated the scope of health education as follows: 


Health education is concerned with healthy growth, the prevention of 
disease, the correction of physical impairments, and the building of a 
healthful environment. At its best health education builds physical se- 
curity and contributes to self-realization, social security, and the welfare 
of society. f 


This statement makes clear the great responsibility of the school 
for the health education of the child, especially because the child 
not only undergoes physical as well as intellectual experiences of 
rather broad scope in his school activities but also because his school 
actions and personality are influenced by his physical activities out- 
side of and beyond the control of the school. 


Aims of health education 


| The aims stated by a joint committee of educators and physicians ? 
indicate the purposes underlying health education. 


г. To instruct children and youth so that they may conserve and im- 
prove their own health. 

2. To establish in them the habits and principles of living which thru- 
out their school life and in later years will aid in providing that 
abundant vigor and vitality which are a foundation for the greatest 


p happiness and service in personal, family, and community 
ife. 


* Ruth Strang, “Health Education.” Encyclopedia of Educational Research, Re- 
vised edition. Macmillan Co., New York, 2950. p. 529. 
| ? Health Education. Report of Joint Committee on Health Problems in Educa- 
tion of the National Education Association and the American Medical Association 
Second revision. National Education Association, Washington, D. C., 1941. р. 15- 
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3. To promote satisfactory habits and attitudes among parents and 
adults thru parent and adult education and thru the health education 
program for children, so that the school may become an effective 
agency for the advancement of the social aspects of health education 
in the family and in the community as well as in the school itself. 

4. To improve the individual and community life of the future; to 
insure a better second generation, and a still better third generation; 
to build a healthier and fitter nation and race. 


Comparatively minor differences in topics comprising the health 
curriculum are found from school to school in the secondary grades. 
The major areas of study have to do with: (1) such health habits 
as cleanliness, food and nutrition, sleep and rest, posture and exer- 
cise, dental hygiene, ventilation; clothing; first aid and safety; and 
effects of alcohol and narcotics, and (2) attitudes of courage, help- 
fulness, consideration of others, independence, adaptability, and 


enjoyment of daily living.? 


2 MEASUREMENT AND EVALUATION IN HEALTH EDUCATION 


Classroom testing in the field of health education by the use of 
paper-and-pencil tests has not attained any significant state of 
development. Since standardized health tests are not numerous, the 
discussion to follow is necessarily brief. Some of the measuring in- 
struments discussed in the latter part of this chapter under the head- 
ing of physical education have at least indirect significance as health 


measures. 


Health knowledge tests 

Several good health knowledge tests have been published since 
1930, but most of the older tests doubtless have little significance 
today because of the tremendous advances that have been made in 
nutrition since their publication and the importance of correct 
knowledge concerning nutrition as a basis for making dietary 
decisions. 

Strang pointed out that norms are of no great significance for 
health knowledge tests because the discovery of individual pupil 
variations and their meaning is much more fundamentally important 


3 Bernice E. Leary, A Survey of Courses of Study and Other Curriculum Mate- 
rials Published since 1934. U. S. Office of Education Bulletin, 1937, No. 31. U. S. 


Severnment Printing Office, Washington, D. C., 1937. 
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than are comparisons of individual or group test performance with 
a norm.* 


Sample items from two health knowledge tests are given in 


accompanying illustrations as representative of the testing technique 
and content of modern tests in this field. One of the best-known 
tests of this type is the Gates-Strang Health Knowledge Test, which i 
is a revision of an earlier test by the same authors. 


Excerpts from Gates-Strang Health Knowledge Test * 


8. If oil or kerosene has 
caught fire, the worst 
thing to use to put 
out the fire is 


14. 


37. The 
blood vessels 


small 
con- 


very 


necting the arteries 
and veins are called 


a, Salt. CUM MEN a a. Tissues. a 
bo sand Ж... n ee b D Pores ms... b 
c-*Waterm Ет с 5 Бесгебопѕ, ....... c 
ФОН, NE СИ ИОС, ©......... к 
e. A heavy rug. ..... е ec Capillaries:^ u «s = 
The person who lives 40. We say that a person 
the most useful life is immune from a 
is the one who disease when 
a. Tries to be the a. He has not been 
leader of every near sick persons. . a 
group. ........... ——4 Ь. His body has made 
b. Works for the wel- substances that pro- 
fare of those around tect it from the bac- 
him. 0 А Lem 0) teria that cause the 
c. Seeks his own pleas- disease. osos __ 80 
ure first of all. .... c c. He has disinfected 
d. Thinks of his own his sickroom. ..... ЕЕЕ. 
health all of the d. His body  resists 
times m ш Ld cold and fatigue. ...——— 
e. Tells other people e. He has had the dis- 
which medicine to ease three times. ... ——e 
take when they feel 
SICK. at a NN e 


1 Strang, оў. cit. p. 536. 


5 Arthur I. Gates and Ruth Strang, 
7 to 12. Published by Bureau of Public: 
1937. 


Gates-Strang Health Knowledge Test, Grades 
ations, Teachers*College, Columbia University, 
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Excerpt from Kilander Health Knowledge Test ° 


зт. An occupational disease resulting from breathing certain types of 
rock- or sand-dust over long periods of time is— 


I. pneumonia. 
2. the bends. 
3. Silicosis. 
4. bursitis. 


33. Which one of the following is probably most conducive to better 
posture in the schoolroom? 
т. having a chair and desk of the right height 
2. having the light come from the left side 
3. not carrying too many books to and from school 
4. wearing comfortable clothing that does not bind 


Health attitudes inventories 


Comparatively little effective work has been done in the measure- 
ment of health attitudes of pupils, except for the Health Attitudes 
Inventory illustrated on page 286. Health attitudes can be approached 
from two standpoints: (т) pupil attitudes toward health practices 
and beliefs in certain courses of action, and (2) pupil likes and dis- 
likes for various types of foods, activities, and health practices. The 
same weakness is inherent in these instruments as in attitudes scales 
in general—there is little evidence to support the belief that expressed 
attitudes necessarily are borne out in terms of conduct. 

The Health Awareness Test, from which a few sample items are 
shown below, is one of the fairly recent publications of a closely re- 
lated type. It is the result of research in the health measurement 
field carried on by the American Child Health Association. 


Health evaluation inventories 


A series of health inventories was recently developed by the Co- 
operative Study in General Education for use from Grade 9 through 
the college years. The six inventories are numbered and entitled: I, 
Health Activities; II, Health Information ; III, Health Interests ; 
IV, Health Attitudes; V, Analyzing Health Problems ; and VI, Judg- 


6 H. F. Kilander, Kilander Health Knowledge Test. Published by World Book 
Co., 1951. 
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ing Sources of Information in Health Problems. Inventories III and 
IV are represented by brief excerpts in Chapter тт, pages 288 and 286, 
and other illustrations from the series appear in Chapter 9, page 224, 
of this volume. These instruments represent undoubtedly the most 
extensive attempt to evaluate instructional outcomes in health edu- 
cation. They are intended to depict pupil needs for instruction 
and guidance in this area. Norms are not provided. 


Excerpts from Health Awareness Test * 


(Direction for matching test given on blackboard and orally.) 
Wet feet 
Bad cold 
Bedroom 
Garbage pail 
Flies 

Sore throat 
Babies' milk 
Sick people 
Whiskey 
Dirty dishes 


Keep from breeding ( 
Should not touch other people's food ( 
Blow the nose gently, not hard ( 
Keep covered ( 
Scald with boiling water ( 
Should be very clean ( 
( 
( 
( 
( 


молро ыо н 


Should not Бе too warm 


М МА МА AAA 


Directions: 7f statement number т is true, put a circle around 
the T, but if it is false, put a circle around the F. Do the same 
for all the statements. 


т. Candy should be eaten only at the end of a meal. ........ T F 


2. Ап orange, a glass of milk, and hot cooked whole wheat cereal 
is a better breakfast than an orange, a glass of milk, and 


реле нен КӨШ ЭШ o named жоюу А б... ТЮШ 
3. Hot cinnamon rolls or white rolls, fresh and hot, аге the best 
kind of bread for boys and girls. ...................... TB 


Physical examinations 


Physical examinations in connection with the evaluation of pupil 
health will be mentioned only briefly here, for they obviously are not 
the province of the classroom teacher. Health defects often come 


* Raymond Franzen, Mayhew Derryberry, and W. A. McCall, Health Awareness 
Test. Published by Bureau of Publications, Teachers College, Columbia University; 
1933. 


HEALTH AND PHYSICAL EDUCATION 619 


to light in these examinations, although the annual physical check- 
ups in some schools may be so perfunctory as to overlook serious 
health defects. 


3 PREVENTION AND DIAGNOSIS IN HEALTH EDUCATION 


Diagnosis in health education perhaps more than in any other 
instructional field must be considered both from the standpoint of 
school diagnosis and from the usual standpoint of individual pupil 
diagnosis. Class diagnosis has been discussed in several chapters of 
this volume, but school diagnosis is here mentioned for the first 
time. 

Diagnosis in health education perhaps more than in any other 
considered as related to health is a problem of such constant signifi- 
cance that objective tests can be expected to contribute only in a 
relatively indirect manner. They cannot be of value in diagnosing 
contagious diseases and other health conditions that demand im- 
mediate attention when cases are found. The diagnostic significance 
of results from health knowledge tests and health attitudes inven- 
tories is not specific. The former are survey rather than specifically 
diagnostic tools and the latter suffer from the fact that their results 
are not necessarily indicative of health behavior. Therefore, the 
major diagnostic possibilities in the field of health education are 
probably to be found elsewhere than in the standardized test, al- 
though the diagnostic significance of the Health Inventories discussed 
briefly above appears to be considerable. 

The physical examination has diagnostic significance, of course, 
but the advantages of continuous measurement and diagnosis are 
lost when such examinations are made only at infrequent intervals. 
It is possible, however, for the teacher to supplement the physical 
examination through his opportunities for the daily observation of 
pupils in the school. The teacher's place as a diagnostician, non- 
technical though his diagnoses may be, is fundamentally important. 
The teacher should recognize that sore throat, vomiting, skin rashes, 
and various evidences of contagious colds frequently indicate the 
desirability of an immediate dispatch of the pupil to the school 
nurse or to his home. It is through the abilities of teachers to 
diagnose illness, although not necessarily its specific nature, that 
individual and group pupil health are protected. The opportunity 
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for such diagnoses of pupil health is usually provided by the morn- 
ing inspection. 

Diagnosis by the teacher can also be made for less immediately 
important health conditions as the result of continuous observation 
of pupils. For example, visual defects may be recognized through 
postural conditions during reading; auditory defects are some- 
times major causes of poor spelling; malnutrition frequently results 
in physical abnormality; goiter of some types is evidenced in a 
swelling of the thyroid gland in the throat; and such nervous ail- 
ments as epilepsy and chorea furnish unmistakable signs. The teacher 
can often supplement the work of the health agencies of the school 
by constant alertness for such signs of health defects and by con- 
sulting with qualified authorities or referring the case to the proper 
agencies when he recognizes characteristics he believes to be sympto- 
matic of defects needing remediation. 

Prevention as a phase of diagnosis is clearly important in health 
education. Isolation of pupils with contagious diseases, periodic 
chest X-rays, and immunization of pupils against smallpox, diph- 
theria, and typhoid fever are preventive responsibilities now accepted 
by the schools in many communities. Provision of healthful school 
conditions and a desirable type of school atmosphere and morale 
are also important as preventive measures. 


4 OBJECTIVES OF PHYSICAL EDUCATION 


Physical education during the past two decades has come to be 
thought of as making an increasingly valuable contribution to the 
educational process, and its philosophy has consequently been 
dominated recently by broader aims than were generally held pre- 
viously. The colleges and secondary schools have better-organized 
programs than do the elementary schools, as less attention has been 
devoted to physical education for elementary-school children than 
for high-school and college students. The statement of aims that 
follows represents the modern philosophy concerning the contribu- 
tion physical education should make to the attainment of desirable 
educational outcomes in the pupil. 

The general objectives of physical education listed by LaPorte * 


5 William R. LaPorte, “The Ten Major Objectives of Health and Physical Educa- 


ey California Physical Education, Health and Recreation Journal, January 1936. 
p. 6. 
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indicate the types of pupil outcomes to which a good physical educa- 
tion program should lead. 


т. The development of fundamental skills in aquatic, gymnastic, 
rhythmic, and athletic activities for immediate educational pur- 
poses—physical, mental, and social. 

2. The development of useful and desirable skills in activities suitable 

as vocational interests for use during leisure time. 

The development of essential safety skills and the ability to handle 

the body skillfully in а variety of situations for the protection of 

self and of others. 

4. The development of a comprehensive knowledge of rules, techniques 
and strategies in the above activities suitably adapted to various age 
levels. 

5. The development of acceptable social standards, appreciations and 
attitudes as the result of intensive participation in these activities 
in a good environment and under capable and inspired leadership. 

6. The development of powers of observation, analysis, judgment, and 
decision through the medium of complex physical situations. 

7. The development of the power of self-expression and reasonable 
self-confidence (physical and mental poise); by mastery of difficult 
physical-mental-social problems in supervised activities. 

8. The development of leadership capacity by having each student 
within the limits of his ability, assume actual responsibility for cer- 
tain activities under careful supervision. 

9. The elimination of remediable defects and the improvement of 
postural mechanics insofar as these can be influenced by muscular 
activities and health advice, based on adequate physical and health 
diagnosis. 

то. The development of essential health habits, health knowledge and 

health attitudes as the result of specific instruction in health prin- 
ciples and careful supervision of health situations. 


w 


5 MEASUREMENT IN PHYSICAL EDUCATION 


Persons interested in testing and evaluating various aspects of 
physical ability and skill will find almost no commercially published 
paper-and-pencil tests for those purposes. On the other hand, they 
will find voluminous reports of testing and evaluative techniques in 
the educational literature, both books and journals. It should be 
apparent that measures of physical fitness, motor ability, and physi- 
cal skills must be conducted by means of physical and medical 
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measurements and tests and by observation of behavior in situations 
involving physical activity rather than by the use of standardized 
tests. 


Tests. of general physical qualities 


Tests of such qualities as motor ability, physical capacity, and 
athletic ability, all commonly thought to be inherited, are considered 
under this heading. Each of these tests consists in the main of various 
measures of motor abilities and physical skills combined into a 
composite score. Their results are useful variously as a basis for 
classifying pupils into groups for physical education and for predict- 
ing levels of physical attainment. 

Rogers devised a series of physical tests for administration to 
individual pupils from which two types of derived scores are obtained 
—a strength index and a personal fitness index.? The tests, having 
different procedures in some instances for boys and girls, are 
accompanied by tables of normal strength indices differentiated 
for age and sex groups. Their major purpose is to determine defi- 
ciencies and to facilitate classification of pupils into groups having 
common remedial needs. No presentation of these tests can be 
given here because of their detailed nature. 

A test of general motor capacity based on various specific tests 
of track and field events and of strength was developed by McCloy.” 
Something approaching a profile of the individual’s general capacity 
is furnished by the results of this test, which has particular value in 
the prediction of ultimate levels of attainment. 

The Тоша Revision of the Brace Test of Motor Ability consists of 
21 physical stunts yielding a composite score of motor educability.'* 
The tests are so devised and set up that pupils can do the scoring and 
the recommended procedure is that one half of the class score the 
other half on performance of the stunts and that the two groups then 


9 Frederick R. Rogers, Physical Capacity Tests. A. S. Barnes and Co., New 
York, 1938. 

10 C, H. McCloy, “The Measurement of General Motor Capacity and General 
Motor Ability." Research Quarterly of the American Physical Education Association, 
5:46-61; March 1934. 

156. Н. McCloy, "An Analytical Study of the Stunt Type Test as a Measure of 
Motor Educability." Research Quarterly of the American Physical Education As- 
sociation, 8:46-55; October 1037. ? 
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be reversed. Scores are interpreted in terms of T-score values of the 
type dealt with in Chapter r3. 

Johnson devised a test of motor educability that has values for 
the sectioning of classes.? Although it is slower and more difficult 
to administer than the Jowa-Brace Test it is rated as probably the 
best available test of motor educability. 


Cardiovascular tests 


Good and poor physical condition can be determined by cardio- 
vascular tests involving pulse rate and blood pressure under varying 
conditions of rest and fatigue. Several of the tests of this type based 
on pulse counts are sufficiently easy to administer and require so 
little equipment that they are suitable for use by the skilled teacher.'* 
The significance of such tests is somewhat reduced by the fact that 
most of them measure only one type of physiological efficiency, 
whereas some other of the important variables of blood pressure, 
pulse rate, and related functions are not well enough understood at 
this time to be included in these tests. 


Posture tests 


Posture tests cannot be administered in a routine manner in the 
usual school situation because of their complex nature. Most of 
the tests of posture are based on comparisons of pupil silhouettes 
with silhouettes representing standard posture or representing several 
degrees of postural merit from very poor to excellent. Because of the 
wide variability in posture and the incomplete nature of evidence 
on the question, Ashbrook and his colleagues suggested that teachers 
should probably not attempt to make pupils conform to any pattern 


considered desirable.** 


12 Granville B. Johnson, “Physical Skill Test. for Sectioning Classes into Homo- 
geneous Units.” Research Quarterly of the American Physical Education Association, 
3:128-36; March 1932. , А E 

13 C, H. McCloy, Tests and Measurements in Health and Physical Education. 
F. S. Crofts and Co., New York, 1939. Chapter 20. : 

14 Willard P. Ashbrook, Anna Espenschade, and Frederick W. Cozens, "Physical 
Education—Measurement.” Encyclopedia of Educational Research, Revised edition. 


Macmillan Co., New York, 1950. p. 836. 
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General achievement scales 


General achievement scales have been developed for the measure- 
ment of ability in various sports activities. Their purposes are to 
stimulate pupil interest and performance, to determine the sports 
skills of individual pupils and groups, and to diagnose deficiencies. 
Such scales are highly time-consuming, however, and have not been 
adequately validated.'* 


Knowledge and information tests 


Paper-and-pencil tests of knowledge and information in specific 
sports activities and comprehensively for all activities have appeared 
in the physical education journals but have not been published com- 

mercially in standardized form. The following illustrations indicate 
the manner in which various objective item forms are adaptable to 
measurement of knowledge and information in this field.‘ 


True-Fatse IrEMS 


Encircle the correct answer: 


T F The follow-through in a golf drive determines the accuracy 
of the flight of the ball. 


T ? F There are African negro tribes who have athletes able to 


high jump to heights greater than the present American 
record. 


Encircle the correct answer. If the answer is false, cross out the word 
which makes it false and insert the word that makes it true. 


anopheles 
T F  'The.eulex mosquito is the transmitter of the malaria germ. 


Underline T if the statement is true and F if the statement is false. 
If the converse of the statement is true, underline CT; if the converse is 
false, underline CF. 


T F CT CF Low arches are always painful. 


15 Inasmuch as these scales are too highly specialized to warrant presentation OT 
illustration here, the student should refer to the bibliography at the end of this 
chapter for source materials. 

16 McCloy, оў. cit. p. 190-97. 
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MurrIPLE-CHOICE ITEMS 


Place an X in the space before the phrase which correctly completes 
the statement. 


The world's record for the mile run is approximately: 
26 20" 
4 6” 
8' 11" 
2' 19" 


T 


Check the one or more correct answers under each statement: 


According to the currently accepted “best” form for the shot-put 
(for a right-handed putter): 

In the hop, the right foot alights well before the left foot. 
The shot should remain as close to the neck as possible. 
The reverse is of no importance, and is just a traditional 
movement. 

The shot should be held deep in the palm of the right 


hand. 
The best angle (to the ground) of the putting effort is 
approximately forty-one degrees. 


MATCHING EXERCISES 


In the following questions, write the number belonging to the approxi- 
mately correct date in the first space and the number corresponding 


to the correct name in the second space. 


At about the year: 

a physical education program was introduced at the 
Philanthropium in Dessau by 
, physical education was established at the Round Hill 
School in the United States by 
a department of physical education was opened in the 
Y. M. C. A. Training School at Springfield, Massachusetts, 
under the guidance of 3 
, the King of Denmark appointed as professor of physical 


? 


Ju 


education in the university —— — — — ——' 
, the modern Olympic Games were revived, largely because 
of the work of 
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Dates Names 
Т 1770 6. 1887 1. Basedow 6. Hitchcock 
2. 1799 7. 1897 2. Beck 7. Jahn 
3. 1804 8. 1902 3. Bukh 8. Ling 
4. 1810 9. 1906 4. de Coubertin 9. McCurdy 
5. 1823 IO. 1924 5. Gulick то. Nachtegall 


CoMPLETION EXERCISES 


Fill in the blank spaces with the words which most accurately complete 
the statement. 


In the high school low hurdles race, the distance from the start to 
the first hurdle is yards; it is «— 1 . yards between the 
hurdles; and it is yards from the last hurdle to the finish. 
There are hurdles to be cleared. 


Tests of proficiency in sports 


Numerous articles in the physical education journals present tests 
of techniques in a variety of sports.” These tests are usually based 
upon an analysis of the skills involved in the sport. Validation of the 
batteries of tests is by means of comparisons between scores made by 
pupils and teachers’ judgments of pupil proficiency. Ashbrook and 
his colleagues listed the Johnson Test for high-school boys, the Dyer 
Backboard Test for Tennis Ability, the French-C ooper and Russell- 
Lange Volleyball Tests for high-school girls, and the Dyer-Schurig- 
Apgar Basketball Test for high-school girls as those available for 
use at the high-school level.!* 


Physical classification tests 


The importance of tools to be used in the classification of pupils 
for physical education and particularly for competitive sports is ob- 
vious. Physical differences among pupils of the same age are so great 
that classification by chronological age is likely to result in injuries 
to the smaller and weaker children and usually deprives them of 
adequate opportunities for exercise, Two indices useful for classifica- 
tion purposes at the junior-high-school and secondary-school levels 
have been validated. As the brief indications of their nature make 


17 See bibliography at.end of this chapter for such references. 
18 Ashbrook, Espenschade, and Cozens, op. cit. p. 837-38. 
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clear, such indices are obtained by the use of physical rather than 
paper-and-pencil tests. 

Cozens, Trieb, and Neilson developed a classification index for 
secondary-school boys.!? The formula is as follows: 


Classification Index = 2A + .475H + .16W, 


where A refers to age in years, H to height in inches, and W to 
weight in pounds. Another index making use of the same physical 
and age measures was derived for junior-high-school girls? The 
index is obtained by the use of the formula: 


Index = 2А + H + .11W. 


Ashbrook, Espenschade, and Cozens stated that the factors of 
age, weight, and height when properly combinėd are probably al- 
most as useful for classification purposes as are more complex 
measures and the simplicity of their use is of considerable im- 


portance.?* 


6 DIAGNOSIS IN PHYSICAL EDUCATION 


Diagnosis in physical education as well as in health education 
appears to depend much more upon teacher observation and physical 
examinations than upon any standardized testing devices of the 
pencil-and-paper type. The tests of general physical qualities and of 
physical fitness serve some diagnostic functions. Other tests of diag- 


nostic value are those for the measurement of blood pressure under 


varying conditions of fatigue. Both of these types can be given by 
a skilled teacher. Still other tests require technical knowledge and 


equipment not ordinarily possessed by the teacher. 

A significant trend in diagnosis in physical education is that the 
issue is being approached from the functional rather than the struc- 
tural standpoint. Even with functional tests, however, it is felt by 
some that the tests fail to measure the functioning of such organs as 


19 F, W. Cozens, М. H. Trieb, and N. P. Neilson, Physical Education Achievement 
Scales for Boys in Secondary Schools. A. S. Barnes and Co., New York, 1936. ; 
20 F, W. Cozens, Hazel J. Cubberley, and N. P. Neilson, Achievement Scales in 
Physical Education Activities for Secondary School Girls and. College Women. А. S. 
Barnes and Co., New York, 1937. 
21 Ashbrook, Espenschade, and Cozens, op. cit. р. 837. 
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the nervous system, for example, completely enough to furnish a 
highly satisfactory diagnostic score.?? 


Topics for Discussion 


1. Discuss the aims of health education. 
2. Comment upon the nature and present status of health knowledge 
testing. 
3. What is a major limitation of health attitudes inventories? 
4. Discuss some of the preventive and diagnostic procedures in health 
education for use in the classroom. 
5. What are the major objectives of physical education? 
6. In what way are some of the measures of general physical qualities 
useful indications of health status? 
7. In what way are cardiovascular and posture tests useful in physical 
education? 
8: Illustrate some methods of testing knowledge and information in 
physical education. 
9. Indicate the nature of tests of proficiency in sports. 
10. Give some of the procedures useful in the classification of pupils for 
physical education. 
тт. Discuss diagnostic methods in physical education. 
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Measuring and Evaluating 
General Educational Achievement 


Turs CHAPTER treats the following points in the measurement and 
evaluation of general educational achievement: 


Advantages of general achievement batteries. 

Limitations of measures of general achievement. 

General vs. specific surveys. 

Types of achievement batteries. 

Some distinctive features of certain achievement batteries. 


pts po > 


The emphasis throughout this volume is rather definitely on diag- 
nostic and analytic testing and on evaluative techniques in subject 
and performance areas. However, a consideration of the practical 
problems of measurement and evaluation in the classroom leads to 
the conviction that there is a real service to be rendered by survey 
tests of general achievement. Accordingly, tests of that type are 


treated briefly here. 


1 MEASUREMENT OF GENERAL ACHIEVEMENT 


General vs. specific measurement 


The battery type of general achievement test opens up certain 
types of possibilities for diagnostic, analytic, and remedial work and 
for the use of test results in educational guidance. Such a test affords 
a rather complete survey of the pupil's educational status. It pre- 
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sents a perspective of the aspects of his accomplishment measurable 
by paper-and-pencil tests. 

A general survey test of the types described in this chapter may 
reveal a specific weakness in, for example, the language skills. To the 
critical teacher this is a challenge to discover more exactly the 
factors underlying the deficiency. Accordingly, the cases identified 
by the test as weak in certain areas should be subjected to a detailed 
analytic or diagnostic test in the subject for the purpose of locating 
specific difficulties and their causes. 

Results from general achievement tests are of further value in 
affording one basis for educational guidance of pupils. Evidence 
concerning the suitability for the individual pupil of certain sub- 
sequent courses or programs and even of certain vocations may 
often be obtained early in the pupil's school career. When cumulated 
over a period of years and when supplemented by other evidence, 
such test results can contribute significantly to pupil guidance. 


Types of general achievement batteries 


There are available today a number of general achievement bat- 
teries, several of which have very distinct merit. Most of the better- 
known and more widely used batteries are briefly described here. No 
attempt is made to illustrate their measurement techniques, for the 
wide variety of outcomes tested makes that impracticable. More- 
over, illustrations from some of these tests appear in preceding 
chapters of this volume. The batteries discussed and summarized 
below are classified under five headings: (т) general achievement 
batteries for the junior high school, (2) general achievement bat- 
teries for the six-year high school, (3) general achievement batteries 
for the senior high school, (4) specialized achievement batteries, and 
(5) batteries for measuring general educational achievement. 


Advantages and limitations of general achievement tests 


Among the specific qualities of the battery type tests of general 
achievement that have been given considerable emphasis by persons 
interested in the improvement of classroom measurement are the 
following : 

Comparable units of measurement. The use of a uniform unit of 
measurement in the scaling of battery tests constitutes a real advan- 
tage in the interpretation of the test results and in the comparisons 


GENERAL EDUCATIONAL ACHIEVEMENT 633 


of results from one subject to another. While this is an important 
advantage, it does not mean that uniformity in units of measurement 
may not be secured in single tests in unrelated subjects. 

Unity of population in standardization. The fact that the standard- 
ization of most comprehensive batteries is based upon results from 
the same pupils for each of the different subject tests insures a 
better picture of the relationships of achievement in these different 
subjects. For example, the reading achievement of pupils of a certain 
grade can be compared with their language achievement only when 
tests are standardized under these conditions. 

Simplicity of interpretation. The use of comparable units of 
measurement and similar testing techniques in the several tests 
comprising a general achievement battery simplifies the problems 
of comparing and interpreting the results, The raw test scores are 
readily turned into standard scores, educational ages, and grade 
equivalents. Modern graphic methods of summarizing test results 
make effective use of such derived scores. Profile charts of the type 
commonly provided with these tests add to the clearness with which 
test results may be interpreted. Naturally, such profiles are useful 
only in case test scores from a number of different tests are reducible 
to a common unit of measurement. 

Ease of administration and scoring. The tendency of the authors 
of battery type tests to utilize the same or similar types of testing 
techniques throughout the battery unquestionably tends to simplify 
the problems of administering it. The use of uniform methods of 
recording the pupil's responses also simplifies the problem of scoring. 
In general, however, such battery tests are usually so long that the 
time required to administer them and the labor involved in scoring 
them become quite great. However, it may be that this is not too 
high a price to pay for extensive sampling and valid and reliable 
measurement. Furthermore, many of these tests are now available 
for either hand- or machine-scoring. 

Economy of cost. Any economy that results from the use of battery 
tests appears to be conditioned by the assumption that broadly 
diagnostic rather than specifically diagnostic or analytic measure- 
ment is desired. It is probably true that almost any one of the 
modern batteries of achievement tests will furnish a wider sampling 
into more subject fields at a lower cost per pupil than could be 
accomplished by the selection of single-subject tests for the purpose. 
There are numerous occasions, however, when it is of greater impor- 
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tance to measure more intensively a limited range of subjects. For 
this type of measurement the battery tests are usually not the most 
economical. In order to provide for this situation, the authors of 
most test batteries have prepared the tests for certain subjects in 
separate form. Indeed, several of the battery tests are available 
only in the form of separate coordinated test booklets. 


2 GENERAL ACHIEVEMENT BATTERIES—JUNIOR HIGH SCHOOL 


Most of the achievement test batteries now available for use in 
Grades 7 to 9 are designed for measuring outcomes in all of the 
major instructional areas common to the junior high school. Nearly 
all of them occur as the advanced or upper level tests of batteries 
constructed at several levels for use from the elementary or even 
the primary grades to Grade 9. These batteries for use in the junior 
high school typically include separate tests or parts on the expressive 
and receptive language arts and computational skills and on the 
major content areas of science and social studies. Some of them also 
provide tests or parts in the areas of health and safety. 

Space limitations do not permit even brief discussions of all of 
these batteries here. Consequently, only four of the batteries, all of 
which are suitable for use in Grades 7 to o, are dealt with below. The 
other six such batteries—the American School Achievement Tests, 
the Coordinated Scales of Attainment, the Master Achievement Tests, 
the Modern School Achievement Tests, the National Achievement 
Tests, and the Progressive Tests in Social and Related Sciences—are 
similarly treated in the companion volume of this book.! 


Stanford Achievement Tests 


The original battery of these tests, published in 1923, was one of 
the outstanding measuring instruments of that period. The tests set 
new standards of validity, reliability, and other examination criteria 
for later workers and undoubtedly did much to stimulate the im- 
provement of educational measurement in general. After six years, 
and on the basis of much critical analysis and experimentation, the 
tests were revised in the form known as the New Stanford Achieve- 


1 Harry A. Greene, Albert N. Jorgensen, and J. Raymond Gerberich, Measure- 
ment and Evaluation in the Elementary School, Second edition. Longmans, Green 
and Co., New Vork, 1953. Chapter 22. ' 
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ment Tests. They have more recently been revised a second and a 
third time and are again known as the Stanford Achievement Tests. 
In its present form the Advanced battery, for the junior high 
school, consists of nine tests for Grades 7 to 9. The testing time is 
227 minutes. The tests are also available for machine scoring in par- 
tial batteries. Norms are of two types: (1) percentile norms by 
grades, and (2) modal-age norms, based only on those pupils in the 
standardization group who were at grade for their age. 


TABLE 37. Summary of tests and timing: Stanford Achievement Tests * 


Area Test 


Language Arts Reading: Paragraph Meaning 
Word Meaning 
Spelling 


Language 
Arithmetic Arithmetic Computation 
Reasoning 
Social Studies Social Studies 

Science Science 

Study Skills Study Skills 


Metropolitan Achievement Tests 


The present batteries of these tests represent a second revision. 
The original edition was published in the 1920s and the first revision 
was issued in several forms during the period 1931 to 1937. Ten 
tests appear in the Advanced battery for Grades 7 to 9, including 
tests in the two content areas of social studies and sciences. ‘The 
working time required is 225 minutes. The battery is issued in a 
single-booklet edition in Forms R, S, T, and U. The reading tests 
and the arithmetic tests are also available in separate booklets. 

Grade and age norms of both the traditional and the modal-age 


types are furnished for each test. Percentile grade norms of both the 


traditional and modal-age types are also provided for each test for 
October 16-November 15, January 16-February 15, and April 16-May 
15 testing dates. Traditional grade norms are also furnished for pupils 
in parochial schools and for Negro pupils in segregated schools. 


? Truman L. Kelley and others, Stanford Achievement Tests, Advanced. World 


Book Co., Yonkers, N. Y., 1953- 
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TABLE 38. Summary of tests and timing: Metropolitan Achievement 
Tests * 


Area Test Timing 


Reading Reading 25 
Vocabulary IO 
Literature IS 
Arithmetic Arithmetic Fundamentals 40 
Arithmetic Problems 40 
Language Spelling 15 
English 35 
Social Studies History 15 
Geography IS 
Science Science IS 


Gray-Votaw-Rogers General Achievement Tests 


A revision of the Gray-Votaw General Achievement T ests, the 
1951 edition at the Advanced level provides tests for Grades 7 to 9. 
Forms Q, R, S, and T are available. Pupil working time is 135 
minutes. The battery is hand-scorable, but an Abbreviated Edition, 


TABLE 39. Summary of tests and timing: Gray-Votaw-Rogers General 
Achievement Tests * 


Area Test 


Reading Comprehension 
Vocabulary 
Literature 
Arithmetic Computation 


Reasoning 
Language Language 

Spelling 

Social Studies Social Studies 
Science Elementary Science 
Health and Safety Health and Safety 


3 Gertrude H. Hildreth, Richard D. Allen, and others, Metropolitan Achievement 
Tests, Advanced. World Book Co., Yonkers, N. Y., 1946. 

* Hob Gray, David F. Votaw, and J. Lloyd Rogers, Gray-Votaw-Rogers General 
Achievement Tests, Advanced. Steck Co., Austin, Texas, 1950-51. 


GENERAL EDUCATIONAL ACHIEVEMENT 637 


available in Forms U, V, W, and X for Grades 5 to 9, is accompanied 
by machine-scorable answer sheets that can also be scored manually. 
Grade and age norms are provided for each test and for a total score 
and percentile grade norms are given for the total battery score. 


lowa Every-Pupil Tests of Basic Skills 


Growing out of the basic skills tests used in a state-wide testing 
program in Iowa over a period of years, the present edition of this 
battery for Grades 5 to 9 cuts across traditional subject lines in at 
least one test. Tests A, C, and D are designed for the measurement 
of reading, language, and arithmetic skills respectively, but Test B, 
Work-Study Skills, has no direct counterpart in the typical program 
of studies, The tests are issued in four forms—L, M, N, and O. Sepa- 
rate answer sheets for machine-scoring or hand-scoring are available. 
The pupil working time is 268 minutes. 


TABLE 40. Summary of tests and timing: lowa Every-Pupil Tests of Basic 
Skills 5 


Test Part Timing 


Silent Reading Reading Comprehension 58 


Comprehension | Vocabulary 


Work-Study Skills | Map Reading 
Use of References 
Use of Index 
Use of Dictionary 
Reading Graphs, Charts, and Tables 
Basic Language Punctuation 
Skills Capitalization 
Usage 
Spelling 
Basic Arithmetic | Vocabulary and Fundamental 
Skills Knowledge р 
Fundamental Operations 
Problems 


5 E. F, Lindquist, editor, Jowa Every-Pupil Tests of Basic Skills, Advanced. 


Houghton Mifflin Co., Boston, 1940-43. 
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Both pupil norms and school norms are provided with the test 
battery. Pupil norms for each part and each test are of three types: 
(1) grade norms, (2) age-at-grade norms, and (3) percentile grade 
norms. School norms of the type discussed in Chapter $ are also 
furnished for each test part. 


3 GENERAL ACHIEVEMENT BATTERIES—SIX-YEAR HIGH SCHOOL 


Only two batteries for general achievement testing at the grade 
levels 7 to 9 and 9 to 12 are known to the authors. One of these con- 
sists of tests designed for the language arts and computational skills 
areas only and the other includes tests for the major content areas as 
well as for the skills of communication and computation. 


California Achievement Tests 


The Progressive Achievement Tests have been retitled California 
Achievement Tests in the new 1950 editions. Coverage of Grades 7 
to 14 is provided by the Intermediate and Advanced levels of the 
battery. The six tests are published in three separate booklets at each 
level, with one booklet each for reading, arithmetic or mathematics, 
and language, and also in single-booklet editions at both levels. Four 
forms—AA, BB, CC, and DD—are available at the lower level, and 
three forms—AA, BB, and CC—have been issued for the higher level. 
Working time for pupils are тт and 148 minutes for the Inter- 
mediate and Advanced tests, respectively, when answers are recorded 
in the test booklets and slightly more when, as is optional, machine- 
scored answer sheets or the CT B Scoreze answer sheets are used with 
the booklets. 

Grade and age equivalents and percentile grade norms are provided 
for each of the six tests, for total achievement separately in reading, 
arithmetic or mathematics, and language, and also for a total score 
on the complete battery. A handwriting test is provided for use if 
desired and a scale for interpreting quality in terms of grade place- 
ment appears in the manuals for the language test. However, age 
equivalents and percentile norms are not provided for handwriting, 
since this test is a supplement to rather than an integral part of 
the battery. 
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TABLE 41. Summary of tests and timing: California Achievement Tests ° 


Level, Grades, 


and Timing 

Area Test Part Inter- | Ad- 
mediate| vanced 

7-9 9-14 

Reading Vocabulary Mathematics 3 4 

Science 3 4 

Social Studies 3 4 

General 3 4 

Comprehension | Following Directions 8 8 

Reference Skills 5 6 

Interpretation of Meanings 25 20 

Arithmetic Reasoning Number Concepts 4 5 

(Int.) Mathe- Symbols and’ Rules 5 5 

matics (Adv.) Numbers and Equations 5 5 

Fundamentals Addition 10 9 

Subtraction 10 9 

Multiplication 12 10 

Division 12 10 

Language Mechanics of Capitalization 3 3 

English and Punctuation 4 2 

Grammar Words and Sentences 4 5 

Parts of Speech 6 6 

Syntax 4 

Spelling Spelling то то 


Cooperative Achievement Tests 


Several overlapping series of Cooperative Achievement Tests are 
available in the areas of English, mathematics, science, and social 
studies for use at the junior-senior high-school level. Separate answer 
sheets for machine- or hand-scoring are provided for all tests. Several 
forms of each test are now available and new forms are issued 
periodically. End-of-year percentile grade norms based on scaled 
scores are furnished for pupils in several types of schools. 

Two separate batteries are provided in English. The Mechanics 
of Expression Test, A, is included in both, whereas the Effectiveness 
of Expression Test and the Reading Comprehension Test appear in 
Br and Cx levels respectively for Grades 7 to 12 and similarly in 
B2 and C2 editions for Grades 10 to 13. These tests are issued in two 


в Ernest W. Tiegs and Willis W. Clark, California Achievement Tests, Inter- 
mediate and Advanced. California Test Bureau, Los Angeles, 1950. 
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single-booklet batteries and also as five separate booklets. Testing 
time is 4o minutes for each test and 120 minutes for each of the 
two batteries. 


TABLE 42. Summary of tests and timing: Cooperative Achievement 
Tests * 


Grades and Timing 


Area Test Part 
7-9 10-13 


English Mechanics of |Grammatical Usage 15 
Expression Punctuation and 
Capitalization 15 
Spelling то 
Effectiveness of | Sentence Structure and 
Expression Style IS 
Diction 10 
Organization 18 
Reading Com- | Vocabulary 15 
prehension Speed of Comprehen- 
sion 
Level of Comprehen- 
sion 
Mathematics |Mathematics Skills 
Facts, Terms, and 
Concepts 
Applications 
Appreciation 
Terms and Concepts 
Comprehension and 
Interpretation 
Science Science Informational 
Background 
Terms and Concepts 
Comprehension and 
Interpretation 
Social Studies | Social Studies | Informational 
Background 
Terms and Concepts 
Comprehension and 
Interpretation 


25 


Two series of tests are also available for the three content areas. 
One consists of three separate booklets in mathematics, science, and 
social studies specifically entitled for use in Grades 7, 8, and 9. The 


T Cooperative Achievement Tests. Cooperative Test Division, Educational Testing 
Service, Princeton, N. J., 1947-53. 
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other is the series of three separate tests in the same areas for use 
in Grades ro to 13 and designated as Cooperative Tests of General 
Proficiency. The lower level tests are timed at 8o minutes each and 
the higher level tests require 4o minutes each of working time by 
the pupils. 


4 GENERAL ACHIEVEMENT BATTERIES—SENIOR HIGH SCHOOL 


Although more general achievement test batteries are available for 
the junior high school than for the senior high school or the four-year 
high school, several such batteries have been published for use in 
Grades 9 to 12 and in some instances also for the freshman year of 


college. 


Essential High School Content Battery 


This single-booklet battery includes tests in mathematics, science, 
social studies, and English. Available in Forms AM and BM, the 
test provides for the recording of pupil responses on a separate 
answer sheet that is scorable either manually or by machine. The 
battery, requiring 225 minutes of working time, is designed for use 
with pupils in Grades 9 to 12 and with entering college freshmen. 
End-of-year percentile grade norms are provided for each of the four 
tests and a median of the four scores for pupils in Grades 9 to r2. 
The general norms for all pupils are supplemented by differentiated 
norms for pupils in academic and scientific courses and pupils in 


commercial and general courses. 


lowa High School Content Examination 


The original edition of this test, published in 1925, was followed 
in 1943 by the revised, quick-scoring edition. The present single- 
booklet edition includes sections on the four major areas of sec- 
ondary-school instruction and is designed to measure primarily 
knowledge and skill outcomes. Available in Forms L and M, the test 
booklet is intended for multiple use. Two forms of special answer 
sheets—one for hand-scoring and the other for either manual- or 
machine-scoring—are available. The battery requires 75 minutes of 
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TABLE 43. Summary of tests and timing: Essential High School Content 
Battery ? 


Area Nature 


Mathematics Fundamental Skills in Computation 
Vocabulary and Concepts 
Understanding of Functional Relationships 


Application of Mathematics to Life Prob- 
lems 


Interpretation of Mathematical Graphs 


Knowledge of Mathematical Facts and 
Formulas 


interpretation of Data in Tabular Form 
Knowledge of Important Theorems 
Science Science Information 

Using the Concepts of Science 

Using the Methods of Science 


Social Studies Acquaintance with Contributions of Fam- ) 
ous Americans 


Understanding of Current Social and 
Political Problems 


Understanding of Vocabulary of Social 
Studies 


Knowledge of Civic Information 


Growth of American Democracy 
Knowledge and Understanding of Global 
Geography 


Knowledge of Contributions of World 
Leaders 


Understanding of International Relation- 
ships 

Knowledge of Sequence of Events in 
United States History 

Knowledge of World Events ) 

Reading for Information Y 

Vocabulary 

Business Definitions 

Use of References 

Literature Acquaintance 

Language Usage 

Capitalization and Punctuation 

Spelling 


English 


HommnuoOou»ru 


8 David P. Harry and Walter N. Durost, Essential High School Content Battery. 
World Book Co., Yonkers, N. Y., 1950. 
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testing time. Percentile grade norms for April testing are provided 
for scores on each of the four sections and for a total score. 


TABLE 44. Summary of sections and timing: lowa High School Content 
Examination ? 


Area Section Timing 
English English and Literature 20 
Mathematics Mathematics 20 
Science Science 15 
Social Studies History and Social Studies 20 


Myers-Ruch High School Progress Test 


This single-booklet test for use in the high school and the first 
year of college consists of four parts—one each in English and the 
three major content areas of secondary-school instruction. It is avail- 
able in Forms AM and BM, and the back of the cover page is printed 
as an answer sheet for hand-scoring. Separate answer sheets are also 
available either for manual- or machine-scoring. The battery requires 
60 minutes of testing time. Percentile grade norms are provided for 
the four high-school classes. 


TABLE 45. Summary of tests and timing: Myers-Ruch High School 
Progress Test !° 


Test 


English 


Social Studies 
Mathematics 
Science 


9 D. B. Stuit, Н, A. Greene, and G. M. Ruch, Zowa High School Content Ex- 
amination, Revised. Bureau of Educational Research and Service, State University 


of Iowa, Iowa City, 1943. 
10 Charles E. Myers, Giles M. Ruch, and Graham C. Loofbourow, M yers-Ruch 


High School Progress Test. World Book Co., Yonkers, N. Y., 1936-38. 
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5 SPECIALIZED BATTERIES 


Two specialized batteries involving the measurement of general 
achievement are the Comprehensive Test Program and the Otis 
Classification Test.'? The former, for use in Grades 4 to o, includes 
an intelligence test, an educational background questionnaire, a 
School practices questionnaire, and a comprehensive achievement 
test. The latter, covering the same grade range, consists of two parts 
measuring general intelligence and general achievement. 


6 TESTS OF GENERAL EDUCATIONAL DEVELOPMENT 


The two tests of general educational development treated briefly 
below differ from the general achievement tests dealt with in the 
preceding sections of this chapter in several ways. They are less 
closely keyed to specific areas or fields of high-school instruction than 
are other general achievement tests. They are, as is implied by their 
titles, based more on the philosophy and practice of general education 
than are the tests treated above. They are concerned less with knowl- 
edge of content and more with functional skills and applications than 
are most test batteries at the secondary-school level. The result is 
that they deal less with the formal and often temporary instructional 
outcomes and are more concerned with the level at which an indi- 
vidual can use knowledge in functional situations than are the other 
tests discussed in this chapter. 


lowa Tests of Educational Development 


The Тоша Tests of Educational Development are the outgrowth of 
a state-wide testing program for secondary schools conducted over 
a period of years by the State University of Iowa. The tests were 
made widely available by their publication in a single-booklet edition 
for nation-wide use in 1942. The original X-1, X-2, Y-1, and Y-2 
forms were followed in 1952 by the X-3 form. A single answer sheet 


11 William A. McCall and John P. Herring, A Comprehensive Test Program: 
Manual for Teachers. Laidlaw Brothers, Inc., Chicago, 1937. 
1? Arthur S. Otis, Otis Classification Test. World Book Co., Yonkers, N. Y., 1941. 
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is used for this battery and the results are scored and reported to 
schools using the tests by the State University of Iowa through ar- 
rangements with the publisher. The Y-2 form was also issued in 1951 
in nine separate booklets with accompanying answer sheets subject 
to manual- or to machine-scoring. 

This integrated battery of tests for Grades 9 to тз is designed to 
yield a comprehensive description of each pupil not only in the sense 
that it measures all broad aspects of intellectual development that 
are readily measurable but also in the more important sense that 
emphasis is placed on the ultimate outcomes of the educational pro- 
gram rather than on the outcomes in the separate school subjects or 
areas. The four general background tests in social studies, science, 
English, and mathematics are supplemented by functional tests of 
abilities to interpret reading materials in the first three of these areas 
and by tests of general vocabulary and use of sources of information. 
Test то, Understanding of Contemporary Affairs, was introduced in 
an answer sheet edition in 1952, although it is not an integral part 
of the battery. 

Provision is made for the reporting of separate standard scores for 
each of the nine basic tests and for a composite of Tests 1 to 8 on 
pupil profile cards prepared in quadruplicate. The battery is designed 
for administration in three or four sessions and requires 459 minutes 
of pupil working time. Percentile grade norms for pupil scores and 
also for school averages are available both on a nation-wide basis 
and for several regions of the country separately. 


TABLE 46. Summary of tests and timing: lowa Tests of Educational 
Development '* 


Type Test Title Timing 

Background Tests 1 | Understanding of Basic Social Concepts 85 
2 |Background in the Natural Sciences бо 
3 |Correctness and Appropriateness of Expression| бо 
4 | Ability To Do Quantitative Thinking 65 

Reading Tests s | Ability To Interpret in the Social Studies 60 
6 | Ability To Interpret in the Natural Sciences 60 
7 | Ability To Interpret Literary Materials 50 

Vocabulary Test 8 |General Vocabulary 22 

Sources of Infor- ; 

mation Test о | Uses of Sources of Information 


13 Е, F. Lindquist, general editor, Iowa Tests of Educational Development. 
Science Research Associates, Chicago, 1942, 1951. 
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USAFI Tests of General Educational Development 


The U. S. Armed Forces Institute Tests of General. Educational 
Development were issued near the end of World War II primarily 
for use in the accreditation by examination of veterans returning to 
civilian life with educational competencies at levels beyond those 
for which they had obtained formal credit. The five tests of the 
battery are closely similar in title and content to five of the /owa 
Tests of Educational Development. 

At first restricted for use by the U. S. Armed Forces Institute, the 
battery of five tests in English; interpretation of reading materials 
in social studies, science, and literature; and general mathematics 
was later made available for general use in Form B. Approximately 
two hours of working time is required for each test. The battery is 
intended for use with high-school seniors and adults. Percentile norms 
for each of the five tests are provided for all high-school graduates 
and separately for high-school graduates from six regions—the New 
England, Middle Atlantic, Southern, North Central, and North- 
western states and the state of California. 


TABLE 47. Summary of tests and timing: USAFI Tests of General 
Educational Development ++ 


Test Title Timing 
T Correctness and Effectiveness of Expression 120 
II Interpretation of Reading Materials in the Social Studies 120 

IH Interpretation of Reading Materials in the Natural Sciences 120 
IV Interpretation of Literary Materials 120 
V General Mathematical Ability 120 


Topics for Discussion 


г. What major functions are served by batteries of general achieve- 
ment tests? 

2. What are some of the advantages of general achievement test 
batteries? 

3. What are some of the limitations of general achievement test 
batteries? 


14 Examinations Staff, United States Armed Forces Institute Tests of General 
Educational Development, High School Level. American Council on Education, 
Washington, D. C., 1944. 
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4. How do the functions of these test batteries and of analytic or 
diagnostic tests differ? 
5. What are some of the instructional outcomes achievement test 
batteries fail to measure? 
6. How may general achievement tests be used in the grade place- 
ment and sectioning of pupils in a school system? 
7. Into what major types may achievement test batteries be classified? 
8. How do tests of general educational achievement differ from other 
general achievement test batteries? 
9. What are the advantages of the pupil profile charts provided with 
most achievement test batteries? 
10. How can achievement test batteries and pupil profile charts be used 
in the measurement of the educational progress of pupils? 
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Glossary 


ability. The capacity or power to produce. 

accomplishment. See achievement. 

accomplishment quotient. See achievement quotient. 

accuracy. The ratio between the number of items answered correctly 
and the number of items attempted. 

achievement. The accomplishment or production of the pupil in his 
school work. 

achievement age. A pupil's level of accomplishment in a particular 
school subject or field. 

achievement quotient (AQ). The ratio between educational age 
and mental age. 

achievement test. A test that measures the pupil accomplishment 
resulting from instruction and learning. 

adequacy. An examination criterion indicating the degree to which a 
test samples extensively or widely over the content or activities to 
be tested. 

adjustment. The process of effecting a satisfactory adaptation to 
one's environment. i А 

adjustment inventory. An instrument used to determine how satis- 
factorily the individual has adapted himself to his environment. 

administrability. An examination criterion indicating the character- 
istics of a test that make for ease and accuracy in giving it. 

age-at-grade norms. Norms based on pupils grouped or classified by 
ages within their school grades. , 

age equivalent. The score derived from age norms on а standardized 
test. 

age norms. Tables of values representing typical or average perform- 
ance on standardized tests for pupils in different age groups. 

alternate-response item. A type of test item to which the pupil re- 
sponds by indicating which of the two possible answers is right 
and which is wrong. | t th 

ambiguity. The quality of a test item that makes possible more an 
one logical interpretation of its intent or meaning. ! Ha 

analogies test. A test of logical reasoning ability involving simi- 


larities and dissimilarities. 
649 
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analysis. The process of reducing or taking apart a total performance 
in the identification of specific skills. 

analytic test. A test that furnishes a basis for the analysis of skills 
underlying a performance by securing different measures of abilities 
contributing to total performance. 

anecdotal record. Ап objective account of pupil behavior made by 
the teacher or some other person observing a significant event in the 
life of the pupil. 

answer sheet. A separate piece of paper, usually printed, on which 
the pupil records his responses for a test. 

applications. An instructional or learning outcome involving the use 
of skills, knowledges, concepts, and understandings in practical 
situations. 

appraise. See evaluate. 

appreciations. An instructional or learning outcome involving a judg- 
ment concerning the worth of a piece of art, an event, or an 
experience. 

aptitude. An ability in a certain field or area of performance. 

aptitude test. A test of specific intelligence, i.e., intelligence as it 
operates in a certain field or area or performance. 

arithmetic mean (A.M.). The point on the scale above which and 
below which the sums of the deviations are equal; the sum of the 
scores divided by their number. 

array. A collection of data arranged in a systematic order. 

association method. А technique of personality evaluation involv- 
ing free responses to certain stimuli. 

assumed mean. The midpoint of the class interval in which it is 
"guessed" that the arithmetic mean will fall. 

attitudes. An instructional or learning outcome represented by a state 
of readiness which exerts a directive, and sometimes a compulsive, 
influence upon an individual's behavior. 

attitudes scale. Ап instrument used in the determination of pupil 
opinions or beliefs on an issue or issues which may be contro- 
versial in nature. j 

average. A generic term for measures of central tendency. 


basic skills. Tool skills, such as those of reading, language, and 
arithmetic. 

basic skills test. An achievement test measuring performance in such 
types of communication as speaking, listening, reading, writing, 
and computing. 


behavior. All types of responses made by the individual, particularly 
those that can be observed. 
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best answer item. А type of multiple-choice item to which the pupil 
responds by attempting to select the best answer from alternatives 
of which more than one may be correct. 

bi-factor test. A type of intelligence test from the use of which two 
scores for separate aspects of mental ability are obtained. 

business education. Such subjects as business arithmetic, English, 
and law; bookkeeping; stenography and typewriting; clerical train- 
S consumer problems; and distributive education, as used 

ere. 


capacity. The power to learn or profit from experience. 

case study. A comprehensive approach to the evaluation of the total 
personality of the individual pupil. 

central tendency. A term corresponding to average, commonly ap- 
plied to the arithmetic mean, median, and midmeasure. 

*chance-half" coefficient. An estimate of test reliability useful when 
only one form of a test is available. 

check list. A list of steps in performing a certain operation used by 
an observer in evaluating pupil proficiency in some skill. 

chronological age (CA). Life age; the number of years since birth. 

class analysis chart. A device for the graphical representation of 
class performance and individual pupil performance on the various 
parts of certain achievement tests. 

classification. The process of assigning a pupil to the grade or unit 
of a school for which his abilities and training best fit him. 

class interval (c.i.). One of the divisions of a frequency distribution. 

classroom test. A test made by the teacher or within a school system 
for use in specific classes. 

clues. Characteristics of test items which frequently aid the pupil 
in determining the correct answers. 

coefficient of alienation (k). An index of the degree to which two 
variables are unrelated. 

coefficient of correlation (r). A measure of relationship that ranges 
in value from +1.00 through zero to — 1.00; refers here mainly 
to Pearson product-moment coefficient. 

comparability. An examination criterion indicating the characteristic 
of a test that enables the user to obtain from different administra- 
tions of the test results that have equivalent meaning. 

comparable measures. Scores or values that are expressed in terms 
of the same unit and with respect to the same point of origin. 

completion exercise. А type of test exercise to which the pupil re- 
sponds by filling the blanks of a statement with the words, numbers, 
or phrases he believes will correctly complete the meaning. 
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composite score. A single value used to express the results obtained 
from the use of several different measures. 

comprehension score. А score indicating the degree of a pupil's 
understanding of an exercise or of material read. 

concepts. An instructional or learning outcome involving comprehen- 
sion of meaning. 

constant error. A type of deviation from complete accuracy that re- 
sults from the tendency of some scorers to give high marks and of 
other scorers to give low marks consistently. 

content subjects. Fields in which mastery consists mainly in the 
acquisition of informations and attitudes, as the social sciences and 
Sciences. 

correction. Ап adjustment used in computing the arithmetic mean, 
standard deviation, and correlation coefficient by the short method. 

correction for chance. А practice followed in scoring some types 
of objective tests to take account of guessing. 

corrective teaching. Steps taken to remedy observed defects or 
difficulties in pupil learning. 

correlation. The degree of relationship existing between two or more 
sets of measures. 

correlation chart. А two-way or double-entry table that shows the 
relationship existing between pairs of measures for the same indi- 
viduals or items. 

correlation coefficient. See coefficient of correlation. 

criterion. А standard by which a test or other product is judged or 
evaluated. 

cumulative frequency. The sum of all the scores in a frequency 
distribution up to any given point. 

cumulative frequency distribution. A distribution of cumulative 
frequencies. 

cumulative frequency graph. A graphical representation of a 
cumulative frequency distribution. 

cumulative pupil record. A comprehensive, cumulative record of 
pupil background, ability, achievement, and behavior. 

curricular validity. Evidence of test validity shown by adequate 
coverage of curriculum content by a test. 

cursive writing. Handwriting with the letters joined. 


decile. One of the nine points that divide a distribution into ten 
equal areas. 

derived score. A value having comparable meaning for the results 
from various tests. 
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deviation. The amount by which a score or other measure differs 
from the central tendency of the group of scores in which it is 
included. 

diagnosis. The identification and location of specific strengths or 
weaknesses in performance. 

diagnostic test. A test used to locate the nature, and if possible the 
causes, of disability in performance. 

differential aptitude tests. A term often applied to multiple-factor 
tests of mental ability. 

difficulty. The characteristic in a test item that results in a small 
percentage of correct responses. 

directed observation. A technique of personality study involving 
observation of certain specific types of behavior in the pupil. 

discriminative power. The quality of a test item that results in 
adequate distinctions in percentages of correct answers by pupils 
of varying ability levels. 

dispersion. See variability. 

double-entry table. See correlation chart. 

drill test. A paper-and-pencil instrument designed for use by pupils 
in practicing certain skills. 

duplicate forms. See equivalent forms. 


economy. An examination criterion indicating the cost of a test in 
time and money requirements. 

educational age (EA). A pupil’s level of accomplishment in a num- 
ber of school subjects. 

educational quotient (EQ). The ratio between educational age and 
chronological age. 

educational test. A measuring instrument that appraises the results 
or effects of instruction and learning. 

emotional adjustment inventory. See adjustment inventory. 

equated scores. Derived scores that are comparable from test to 


test of a certain battery. 
equivalent forms. Duplicate or equal forms of a standardized test 


that yield closely similar scores. 
error of grouping. A variable error introduced by the practice of 
combining in class intervals scores or measures that are unlike. 


essay examination. A test to which the pupil ordinarily responds 
with written discussion of issues raised in several broad questions. 

evaluate. To test, measure, and appraise the “whole” child by the 
use of tests and a wide variety of non-test tools and techniques. 


examination. See test. 
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exercise. A unit of a test governed by a specific set of directions. 

expectancy. The standard of future achievement held reasonable 
for the individual pupil. 

expressive language arts. English, handwriting, and spelling, as 
used here. 

extensive sampling. See adequacy. 

extrapolation. The process of locating a point beyond two or more 
known points in accordance with the conditions operating in the 
given case. 


factor analysis. A method widely used in the study of the nature of 
mental and other abilities. 

factored test. A test from the use of which several scores represent- 
ing different factors of general ability are obtained. 

faculty theory. The theory that intelligence consists of a large num- 
ber of relatively independent and largely correlated and specialized 
abilities, such as memory and imagination. 

feeble-minded. The term used to designate persons of inferior in- 
telligence having IQs below 70. 

fine arts. Music and art, as used here. 

first quartile (Qı). The point on a scale of values below which 25 
per cent of the cases fall; the 25th percentile. 

"footrule" coefficient. An index giving an estimate of test reliability 
useful when only one form of a test is available. 

foreign languages. Primarily French, German, Italian, Spanish, and 
Latin, as used here. 

form. One of the two or more arrangements of closely similar or 
equivalent standardized tests that in itself constitutes a testing unit. 

frequency (f). The number of measures in a given class interval of 
a frequency distribution. 

frequency curve. See frequency polygon. 

frequency distribution. The table in which scores or other meas- 
ures are classified. 

frequency polygon. A type of graphical representation used to show 
the manner in which scores in a frequency distribution are dis- 
tributed. 

fulcrum. The axis upon which a lever is supported and rotated. 


general ability. Closely similar to general intelligence; ability to 
learn. 

general achievement test. An educational test covering several 
fields of study and ordinarily adapted for use in several grades. 

general intelligence test. A test of general mental ability. 


GLOSSARY 655 


genius. А person of superior intelligence having an IQ of 140 or 
above. 

gradation. See classification. 

grade. The administrative division of the school that indicates the 
educational level of the pupil. 

grade equivalent. The score derived from grade norms on à stand- 

ardized test. 

grade norms. Tables of values representing typical or average per- 
formance on standardized tests for pupils in different grades. 

group dynamics. Interactions among the individual members of a 
group engaged in some cooperative activity. 

group factors of intelligence. The different phases or aspects of 
intelligence resulting from scientific analyses of intellectual abili- 
ties. 

group-factor test. A type of intelligence test from the use of which 
separate scores {ог several aspects of mental ability are obtained. 

group test. A test that can be administered to a number of pupils 
at the same time. 

grouping. The process of c 
intervals or steps. 


Jassifying and tabulating data into class 


half-sum. A term used in the calculation of the median. 


halo effect. The tendency of a teacher to be influenced in rating pupil 


performance by impressions previously acquired. 
health education. Health facts, attitudes, and practices, as used 


here. 
histogram. А type of graphical representation employing only hori- 


zontal and vertical lines. 


identification test. А test of ability to recognize and name objects 


shown or pictured. 
idiot. A feeble-minded person having an IQ below 25. 
imbecile. A feeble-minded person having an IQ from 25 to 49. — 
index of brightness (IB). A measure of brightness somewhat simi- 
lar to the intelligence quotient in meaning. 
index of studiousness. The difference between a pupil’s rank in his 
class on intelligence and on achievement. 
individual differences. The observed or measured variation of in- 


dividuals in ability, progress, or achievement. н 
individual test. А test that can be administered to only one pupil 


at a time. ) 
industrial arts. Such subjects as m 
chanics, and mechanical drawing, 


anual training, shop; home me- 
as used here. 
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informal objective test. A teacher-made objective test. 

instructional objective. An aim or purpose of instruction. 

instructional outcome. A result of instruction stated in terms of 
pupil behavior. 

instructional test. A test used directly in teaching a unit of material. 

integral limits. The lower and upper whole-number limits of a class 
interval in a grouped frequency distribution. 

intelligence. The ability to adapt oneself to changing conditions; 
ability or power to learn. 

intelligence quotient (IQ). The ratio between mental age and 
chronological age. 

intelligence test. A test that measures ability to learn or to profit 
from experience. 

intensive sampling. A narrow and inadequate selection of test items 
that results in a test of too little scope or range. 

interests. An instructional or learning outcome represented by a men- 
tal set that urges a person to act in a certain manner. 

interests inventory. Ап instrument used in the determination of 
pupil interests in various fields or areas of performance. 

interpolation. The process of locating an intermediate point between 
two known points in accordance with the conditions operating in 
the given case. 

interpretive test. An achievement test in which items are based on 
data presented in verbal, numerical, or graphical form. 

interval. See class interval. 

interval deviation. The number of class-interval units by which a 
certain interval in a frequency distribution differs from the interval 
in which the arithmetic mean is assumed to lie. 

interview. A personal conference technique frequently used in diag- 
nosis and in the evaluation of attitudes. 

inventory. A personal-report type of scale or test commonly used in 
measuring personality. 

inventory test. A test used as a preliminary check on the degree of 
mastery existing prior to instruction. 

item count. A method used to determine whether test items properly 
discriminate between pupils of various ability levels. 


job analysis. The process of breaking down a certain task into its 
elements or component parts. 


knowledges. An instructional or learning outcome represented by 
the ability to recall or recognize facts, persons, places, or things. 
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learning outcome. A result of experience in or outside of the school 
stated in terms of pupil behavior. 

listening test. A test of ability to comprehend spoken language. 

logical validity. See psychological validity. 


machine-scored test. A test that can be scored by the use of an 
electrical or mechanical scoring machine. 

manual arts. See industrial arts. 

manuscript writing. A free-hand style of lettering in which the 
letters are not connected as in common script writing. 

mark. The teacher’s numerical or letter evaluation of pupil achieve- 
ment in a course or area of performance. 

mastery test. An achievement test designed to determine how thor- 
oughly a pupil has learned certain facts or skills. 

matching exercise. A type of test exercise to which the pupil re- 
sponds by attempting to pair the related items in two or more 
columns of related facts or ideas. 

mathematics. Such subjects as general mathematics, algebra, plane 
and solid geometry, and trigonometry, as used here. 


mean. See arithmetic mean. ! 
measure. To test by means of standardized and teacher-made instru- 


ments mainly in the fields of achievement and intelligence; a test 
score or other numerical rating. 

median (Mdn.). The point on the scale below which half of the 
measures in a frequency distribution fall. 

mental ability. Ability or power to learn; near 
intelligence. 

mental age (MA). The intelligenc 
expressed in terms of the chrono! 
ability is typical. 

mental test. A test о 
from an educational test. : 

metronoscope. A device for exposing strips 0 


reading drill. i 4 
midmeasure. The middle measure of a series of values arranged in 


ly synonymous with 


e or mental ability of a person 
logical age of which his mental 


f intelligence or personality, as distinguished 


f reading material for 


order of magnitude. Н ; 
midpoint. The exact middle of a class interval in а frequency dis- 


tribution. 
moron. A feeble-minded person having an IQ from 50 i 69. 3 
multiple-choice item. А type of test item to which the pupil re- 
sponds by attempting to select the correct or best response trom 
the several alternatives given. 
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multiple-factor test. See group-factor test. 
multiple-response item. A type of test item to which the pupil re- 
sponds by attempting to indicate all correct answers. 


new-type examination. See informal objective test. 

non-language test. A test not involving the use of words in its ad- 
ministration or taking, e.g., a test given by pantomime. 

non-test tool. An instrument other than a test used in measuring 
pupil behavior. 

non-verbal test. A test not involving the use of words by the pupils 
in attaching meaning to the items, e.g., a figure analogies test. 

normal. Typical in progress, growth, development, or distribution. 

normal curve. The graphic representation of a large number of cases 
in the selection of which chance was operative. 

norms. The median or average performances on standardized tests of 
pupils of different ages or grade placement or of school groups. 


object test. А test involving the use of three-dimensional objects. 

objective. An aim or purpose. 

objective test. А test for which the scoring procedure eliminates 
subjective opinion and judgment. 

objectivity. An examination criterion indicating the degree to which 
subjective opinion and judgment are eliminated in the process of 
scoring it. 

objectivity coefficient. А correlation coefficient used in describing 
the objectivity of a test. 

observational methods. Certain techniques of personality study, 
e.g., directed observation and the anecdotal method. 

ogive. See cumulative frequency graph. 

ophthalmograph. А binocular camera used in measuring eye move- 
ments during reading. Ў 

oral examination. A test administered and answered orally. 

outcome. A result stated in terms of pupil behavior. 


percentile. One of the ninety-nine points that divide a distribution 
into one hundred equal areas. 

percentile curve. See cumulative frequency graph. 

percentile-grade norms. Tables of percentile ranks on test scores 
for pupils in different school grades. 

percentile norms. Tables of values representing percentile ranks 


of bn on standardized tests for certain subjects or certain 
grades. 
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percentile rank. The position assigned to a score in an array for 
which the scores are divided into one hundred equal divisions in 
descending order. 

performance. The accomplishment, achievement, or behavior of the 
pupil. 

performance test. A test to which the pupil typically responds by 
motor or manual rather than by verbal behavior. 

personal constant (PC). А measure of brightness obtained by the 
use of Heinis growth units for both the mental age and the chrono- 
logical age. 

personal reports. The responses given by the pupil on certain types 
of personality scales and inventories. 

personality. An individual's total behavior in social situations. 

personality inventory. An instrument that measures such intangi- 
ble aspects of behavior as attitudes, interests, and adjustment. 

personality quotient (PQ). A quotient sometimes used in the meas- 
urement of total personality. 

physical education. Motor skills, attitudes, and activities, as used 
here. 

point score. See raw score. 

power test. A test that measures the difficulty of the task the pupil 
is just able to perform. 

practicalarts. Such subjects as domestic science or home economics, 
cooking, sewing, and home management, as used here. й 

practicality. An examination criterion indicating the degree to which 
a test possesses certain utilitarian characteristics. 

practice effect. The influence of a previous experience with a test 
on a later encounter with the same or a similar test. ` 

practice exercise. A few sample items preceding a test designed to 
familiarize the pupils with the nature of the test. 

practice test. See drill test. 

preference. A liking for or predisposition toward some person, ac- 
tivity, or practice. 

preventive teaching. Steps ta 
to guard against the later appear: 
pupil learning. , j 

primary mental abilities. А 
ability. 

product scale. See source scale. 

profile chart. A device used for graphical 
made by the pupil on the various parts О 
intelligence, and personality tests. 


ken at the time of initial instruction 
ance of defects or difficulties in 


term often applied to factors of mental 


] representation of scores 
f certain achievement, 
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prognostic test. A test used to predict future success in speci 
subjects or fields. 

progress record. А device similar to a profile chart on which pupil 
progress from year to year can be shown graphically for certain 
achievement tests. ! 

projective method. 4A technique of personality study involving the 
observation of how a person reacts to certain toys and materials, 

prophecy formula. The Spearman-Brown formula used in estimat- 
ing test reliability from a correlation coefficient between scores on 
“chance-halves” of a test. 

psychological examination. See intelligence test. 

psychological validity. Evidence of test validity resulting from a 
logical dissection of a total learning process. 


quality scale. А series of standard graded samples with which the 
production of the pupil is compared in evaluating performance in 
such areas as handwriting and composition. 

quartile. One of the three points that divide a distribution into four 
equal areas. 

quiz. A short achievement test covering an assignment or a restricted 
unit of course content. 

quotient. A ratio designed to reveal in a single numerical index the 
relative position of the pupil on two related variables. 


range (R). The distance from the lowest to the highest score in а 
series of scores, 


rate score. A score expressing a pupil's rate of work. 

rate test. A test that measures speed of performance on tasks of 
uniform difficulty. 

rating scale. An instrument used by a teacher or other person in the 
evaluation of pupil personality or achievement. 

raw score. The quantitative result obtained directly from the scoring 
of a test or scale. 

readiness test. A test that measures the ability of the pupil to under- 
take a new type of specific learning. 

real limits. The actual or true lower and upper limits of a class in- 
terval in a frequency distribution. 

recall item. A type of test item to which the pupil responds by writ- 
ing words, numbers, or phrases to complete the meaning of а 
statement. 


receptive language arts. Reading, Study methods, and literature, 
as used here. 
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recognition item. A type of test item to which the pupil responds 
by indicating the truth or falsity of statements, selecting the cor- 
rect or best answer from among several given, or indicating the 
proper pairing of related items. 

relative rank. The position assigned to a score in an array for which 
the scores are arranged in descending order. 

reliability. An examination criterion indicating the degree to which 
a test measures what it does measure; consistency of measurement. 

reliability coefficient. The correlation coefficient obtained between 
scores made by the same pupils on two equivalent forms of a test. 

remedial. Having as a purpose the correction of observed difficulties 
and weaknesses in performance. 

remediation. See corrective teaching. 

retesting coefficient. An estimate of test reliability that can be ob- 
tained when only one form of a test is available. 


sampling. The process of selecting a limited number of cases or items 
that will be representative of the large group from which they are 
chosen. 

scale. An instrument used by the scorer in evaluating pupil perform- 
ance or by the test-maker in constructing a test; the continuum 
from the lowest to the highest score in a frequency distribution. 

scaled score. А derived score based upon deviation from the arith- 
metic mean in units of one-tenth of a standard deviation for a 
group established in a certain manner. 

scaled test. A test in which the items are arranged in an order of 
increasing difficulty. 

school norms. Tables of percentile ranks based on the mean test 
scores of pupils in different schools. 

sciences. Such content subjects as general science, biology, physics, 
and chemistry, as used here. 

scorability. An examination criterion indicating the characteristics 
of a test that make for ease and simplicity in scoring it. 

score. A quantitative description of performance. 

score card. А short and simple type of rating scale used in the 
evaluation of products made by pupils. 

score deviation. The number of score units by which a certain score 
in a frequency distribution differs from the mean or the assumed 
mean. 

self-marking test. A test that does not require the use of scoring 
keys or machines in the scoring process. 

sigma (о). See standard deviation. 
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simple recall item. A type of test item to which the pupil responds 
by writing the word, number, or phrase that he believes will cor- 
rectly complete a statement or answer a question. 

simulated-conditions test. A test in which the conditions represent 
or approximate in nature those of the ultimate performance it is 
used to evaluate. 

Skills. An instructional or learning outcome involving some form of 
physical or motor performance. 

social studies. Such content subjects as history, civics and Ameri- 
can government, geography, and problems of democracy, as used 
here. 

social utility. A point of view basic to the selection of curricular 
materials which holds that subject matter should contribute defi- 
nitely to child and adult needs. 

Sociogram. А graphic device for representing interpersonal relations 
within a group of pupils. 

sociometric methods. Certain procedures adapted from sociology for 
use in evaluating pupil behavior. 

Source scale. А series of items of graded difficulty from which tests 
can be constructed, e.g., a spelling scale. 

specific determiners. Characteristics of true-false test items that 
Seem to determine in part the nature of the correct response. 

speed test. See rate test. 

standard. A level of performance agreed upon by experts or estab- 
lished by local school officers as a goal of pupil attainment. 

standard deviation (S.D.). The most widely useful measure of 
variability or dispersion. 

standard error of measurement. A measure of score accuracy used 
in estimating test reliability. 

standard score. А derived score based upon deviation from the 
arithmetic mean in terms of the standard deviation. 

standardization. The process of constructing a test and establishing 
norms for it. 

standardized test. A test for which the items have been carefully 
selected and evaluated and which is accompanied by norms. 

statistical validity. Evidence of test validity shown by correlational 
relationship or other statistical procedures. 

step. See class interval. 

structured inventory. A personal-report type of personality scale 
to which the pupil must respond in one of the several prescribed 
ways. 

subjectivity. The degree to which measurement results are influenced 
by personal opinions ог judgment. 
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sub-total. A term used in the calculation of the median. 

survey test. A test that measures general achievement in certain sub- 
jects or fields. 

synthesis. The process of combining underlying and somewhat iso- 
lated skills so that they form an effective unit. 


T-score. A derived score based upon deviation from the arithmetic 
mean in units of one-tenth of a standard deviation. 

tabulation. The process of grouping and classifying data; the dis- 
tribution into which data are classified. 

tachistoscope. A device for exposing strips of reading material for 
reading drill. 

talent. See aptitude. 

taste. See preference. 

teacher-made test. A test constructed by the teacher, such as the 
essay and informal objective tests. 

teacher’s mark. See mark. 

technique. A procedure or method. 

telebinocular. A type of stereoscope adjustable for various distances. 

test. In the general sense any instrument used in the measurement 
of any educational or mental ability and in a specific sense an 
instrument used by the pupil and ordinarily involving the use of 
paper and pencil; to measure by the use of tests. 

test battery. A group of several tests covering a number of different 
subjects and intended for use in testing over wide areas. 

test item. The smallest unit of a test. 

test rating scale. A scale used in the evaluation of tests for specific 
uses. 

third quartile (Q;). The point on a scale of values below which 
75 per cent of the cases fall; the 75th percentile. 

time-limit test. A test on which the working time allowed pupils is 
rigidly prescribed. 

tool. An instrument of a test or non-test type used in measuring pupil 
behavior. 

tool subjects. Fields in which achievement consists mainly in the 
acquisition of skills and techniques useful in further learning, as 
reading, arithmetic, and spelling. 

traditional examination. See essay examination. 

true-false item. A type of alternate-response item to which the pupil 
responds by indicating whether a statement is true or false. 

two-factor theory. The theory that intelligence consists of a general 
factor, many specific factors, and a number of group factors. 
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understandings. An instructional or learning outcome involving com- 
prehension of meaning and of the uses and significance of what 
has been learned. 

unstructured technique. A projective method of personality meas- 
urement in which the pupil has wide freedom in his manner of 
responding. 

utility. An examination criterion indicating the degree to which a 
test serves a definite need. 


validity. An examination criterion indicating the degree to which a 
test measures what it purports to measure. 

validity coefficient. A correlation coefficient used in expressing 
the validity of a test. 

variability. The spread or dispersion of scores. 

variable. A quality that may exist in different amounts. 

variable error. A type of deviation from complete accuracy that 
results from the tendency of persons to vary in their judgments 
from time to time. 

verbal test. A test involving the use of language in the form of words 
by the pupil in attaching meaning to, responding to, or both attach- 
ing meaning to and responding to the items. 


work-limit test. A test on which sufficient time is allowed for all or 
nearly all pupils to complete their work. 

work-sample test. A test Consisting of a representative portion of 
the ultimate performance it is used to evaluate. 


work-type reading. The types of silent reading skills commonly 
utilized in study. 


yes-no item. A type of alternate-response item to which the pupil 
responds by an affirmative or negative answer to a question. 
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Webb, L. W., 18, 36, 64, 198, 277, 338, 
391 

Webb, Sam C., 552 

Weidemann, Charles C., 159 

Weitzman, Ellis, 64, 198, 338, 370, 391, 
464, 502, 552 

Wellman, Beth L., 260 
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Wesman, Alexander G., 255 
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Whitford, William G., 554, 566, 567, 
573 
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Winslow, Leon L., $54, 564, 572 Wrightstone, J. Wayne, 36, 47, 64, 154, 
Witty, Paul А., 421 155, 203, 218, 219, 237, 486, 502 
Wood, Ben D., 18, 137, 277, 421, 464 Wrinkle, William L., 237 

Woodburn, John, 552 

Woodruff, Asahel D., 33, 36, 168 á 
Woods, Roy C., 573 Yoakam, G. A., 402 

Woodside, C. W., 612 Yocom, Rachael D., 629 
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Woodyard, Ella, 260 

Workman, Linwood L., 63, 370 Zapf, Rosalind M., 548 
Wren, F. Lynwood, 529 Zimmerman, John G., 542 
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Abbott-Trabue Exercises in Judging 
Poetry, 413 
Ability, defined, 340 
Accomplishment quotient, 267-68, 346 


Achievement, performance tests of, 52- 


54 
Achievement quotient, 267-68, 346 
Achievement testing, 13-14 
Adequacy, 75-77 
defined, 75-76 
Adjustment, 168 
Adjustment inventories, 59, 292-96 
Adjustment measurement, 292-96 
Administrability, 79-80 
defined, 79 
Age, mental, 257-58* 
Age equivalents, 344 
Age norms, 96, 363 
Age-at-grade norms, 98-99 
Algebra: measurable qualities in, 518 
measurement and evaluation іп, 517- 
20 
objectives in, 517-18 
prediction of success in, 525-26 
standardized tests in, 518-20 
Alternate-response items, 180-82, 493 
constructing, 191-92 
Ambiguity, 188 
freedom from, 90 
American Council Alpha French Test, 


475 


American Council on Education Cumu-. 
lative Record for Elementary and 
Secondary Schools, 228 

illus., Fig. 17, 230-31 

American Council on Education Psy- 
chological Examination, 253, 264 

American Handwriting Scale, 456 

American School Achievement Tests, 
634 

Analysis (Fig. 1), 50 

diagnosis by, 113-14, 457-58 

Analytic tests, 48-49 

Analytical Scales 
Literature, 413 

Anderson Chemistry Test, 540 

Anecdotal method, 283 

Anecdotal récord, 61, 297-98 

Answer shets, separate, 132-34 

Applications, 168, 170, 538 

Applied art, 570 

Appreciations, 168 

Aptitude, in mathematics, 525-26 

Aptitude measurement: in English, 439- 
40 

purposes of, 439 

Aptitude testing in the Sciences, 549 

Aptitude tests, 31, 36-57, 250-51, 273» 
584-85 , 

in English, 439-40 
in the foreign languages, 477-78 
in sliorthand, 605-6 


of Attainment in 
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Arithmetic, in the junior-senior high 
school, 508-то 
diagnostic testing in, 513 
measurement in, 511-17 
Arithmetic Achievement Tests, 5-12 
Arithmetic mean: defined, 318 
of grouped data, computing 
318 
of ungrouped data, computing the, 
317 
Arithmetic skills, 508-10 
Army Alpha test, 30, 248 
Army Beta test, 30, 248, 257 
Army General Classification Test, 30, 
248 
Army Individual Test of Mental Ability, 


the, 


30 
“Around the World” Attitudes Inven- 
tory, 286 
Art abilities: and achievement, meas- 
urement of, 567-70 
tests, 569-70 
Art appreciation, tests of, 568-69 
Art education: characteristics and aims 
of, 563-67 
measurement and evaluation in, 563- 
73 2 
outcomes ої, 564-67 
trends in, 563-64 
Aspects of Personality inventory: ex- 
cerpt from, 293 
Table 26, 367 
Association methods, 282 
Assumed mean, 319 
Athenians, 22 
Attitude, measurement of scientific, 
546-48 . 
Attitudes, 168, 538 
defined, 285 
measurement of, 284-87 
Attitudes inventories, 490 
Attitudes scales, 58-59, 285 
Aviation Cadet Qualifying Examina- 
tion, 248 
Ayres Handwriting Scale, 453-54, 456 
Ayres Scale for Measuring the Hand- 
writing of School Children, 103 


Badger Mechanical Drawing Tests, 582 

Baker “Telling What I Do” tests, 295 

Basic business education, defined, 595- 
96 k 


Basic Skills, Iowa Every-Pupil Tests of, 
cutout scoring stencil, Fig. 5, 130 
See also Iowa Every-Pupil Tests of 
Basic Skills 
Beach Music Test, 562 
Behavioral categories (Fig. 22), 302 
Beliefs, measurement of superstitions, 
548-49 
Bell Adjustment Inventory, 203 
excerpt from, 60 
Bennett Stenographic 
606 
Betts-Keystone Telebinocular, 403-4 
Betts Ready to Read Tests, 403 
Bi-factor tests, 57, 253-54, 273-74 
Binet-Simon Scale, 29-30 
Biology, 534, 539 
Bisbee Commercial Education Survey 
Tests, 600 
Bluffing, 165 
Bookkeeping, 
602-3 
Boston, examinations in, 22 
Breidenbaugh Bookkeeping Tests, 603 
Brightness, index of, 263-64 
Brown-Carlsen Listening Comprehen- 
sion Test, 403 
Brown Food Score Cards, 589 
Business education: aims and objec- 
tives in, 595-97 
basic, defined, 595-96 
informal objective testing in, 603-4 
measurement and evaluation in, 
595-612 
measurement of interests in, 605 
predictive tests in, 605-8 
standardized tests in, 597-603 
technical, defined, 596 


Aptitude Test, 


standardized tests іп, 


California Achievement Tests, 131, 207, 
638-39 
handwriting scale of (Fig. 12), 208 
California Algebra Aptitude Test, 525 
California Arithmetic Test, profile chart 
for (Fig. 15), 226 
California Language Test (Table 24), 
365 
California Reading Tests, 108, 411 
Reading Vocabulary in Social Science, 
494 


California Short-Form Test of Mental 


Maturity, 253 
excerpts from, 254 


California Test of Mental Maturity, 253 
Cardall Primary Business Interests Test, 
605 
Cardiovascular tests, 623 
Carroll Prose Appreciation Test, 413 
Case study, 297, 208 
Castle Mechanical Drawing Test, 582 
Catch words, 189 
Central tendency, measures of, 317-28 
Chance, correction for, 174-75 
Chance-half coefficient, 74, 386 
Character Education Inquiry, 32 
Chart: class analysis, 229-30, 232-33 
profile, 225-26 
pupil progress, 226-27, 229 
Check lists, 53, 204-6 
Chemistry, 534, 540 
Chicago Test of Clerical Promise, 607-8 
Chicago Tests of Primary Mental Abili- 
ties, 253 
Chinese examinations, 26 
City testing bureaus, 122 
Civics and government tests, 491 
Clapp-Young Self-Marking Tests, 131 
Class analysis and diagnosis, 106-7 
Class analysis chart, 220-30, 232-33 
Class intervals, 312-16 
Classical Investigation, 470, 471 
Classroom measurement, practical as- 
pects of, 12-15 
Classroom testing, 138-39, 160-62 
Clerical aptitude, trade tests of, 607-8 
Clerical aptitudes tests, 606-7 
Clues and suggestions, 188 
sparing use of, 89-90 
Coefficient: chance-half, 74 
footrule, 74, 387 
objectivity, 79 
reliability, 73 
retesting, 73 
validity, 70 
Commercial arithmetic, 
tests in, 603 
Common School Journal, 22, 23 
Commonwealth List, 445 
Comparability, 81-82 


standardized 


defined, 81 
Compass Diagnostic Tests in Arithmetic, 
48 


excerpt from, 49 
Completion exercises, 626 
Completion items, 178-80, 473, 54° 
Composite scores, 353-55 
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Comprehensive Test Program, 644 

Computational skills, standardized test- 
ing in, 511-12 

Concepts, 168, 169, 538 

Content areas, standardized tests in, 
598-99 

Converted scores, 350 

Cook-Bixler Literature 
Tests, 413 

Cooperative Achievement Tests, 639-41 

Cooperative Biology Test, 543 

Cooperative French Test, 473, 474, 475 


Appreciation 


Cooperative German Test, 474 

Cooperative Inter-American Tests, 
469 

Cooperative Latin Test, 477 

Cooperative Literary Acquaintance 
Test, 413 

Cooperative Literary Comprehension 


and Appreciation Test, 413 

Cooperative Literary Comprehension 
Test, 413 t 

Cooperative Mechanics of Expression 
Test (Table 25), 366 

Cooperative Reading Comprehension 
Test, 410-11 | 

Cooperative. Science Test for Grades 
7, 8, and 9, 542 

Cooperative Social Studies Test for 
Grades 7, 8, and 9, 494 $ 

excerpt from, 184 

Cooperative Spanish Test, 474 

Cooperative Study in General Educa- 
tion, 27, 224, 286 

Cooperative testing programs, 122-23 

Cooperative Tests of General Profi- 
ciency, 641 

Cooperative Vocabulary Test, 413 

Coordinated Scales of Attainment, 
634 

Correction for chance, 174-75 

Correlation coefficient, 372-81 

meaning of, 372, 381-84 

Counting techniques, 210-13 

Course objectives, 68-70 

Crary American. History Test, 495 

Criteria, of a good examination, 65-85 

СТВ Scoreze answer sheets, 638 

Cumulative frequency graph, 359-63 

Curricular validity, 68-70 

Cutout scoring stencil, Jowa  Every- 
Pupil Tests of Basic Skills, Fig: 5, 
130 ў 
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Davis Test of Functional Competence 
in Mathematics, 516 
Deciles, 347 
Denny-Nelson American History Test, 
406 
Derived scores, 257-64, 342-55 
and norms, 342-43 
based оп average 
343-44 
based on variability of performances, 
346-50 
quotients as, 345-46 
Determining course marks, 19s 
Determining relationships among the test 
Scores, 371-91 
Detroit Clerical Aptitudes Examination, 
607 
Detroit. General Aptitudes Examination, 
607 
Detroit. Mechanical Aptitude Examina- 
tions, 586 
Diagnosis: Fig. 1, 50 
' individual pupil, 448-49 
meaning and importance of, 113-16 
nature of, 114-15 
Diagnostic and analytic tests, 48 
Diagnostic Examination of Silent Read- 
ing Abilities, 411 
Diagnostic profile charts, 108, тто 
Diagnostic Reading Tests, 406, 407, 
412 
Diagnostic testing, 12-14 
Diagnostic tests, 48-49 
Differential aptitudes tests, 254, 274 
excerpts from, 255 
Difficulty, 90-91 
Dimond-Pfieger Problems of Democ- 
racy Test, 493 
Direct observation, 61, 281, 300 
Directed observation, 283 
Discriminative Power, 91-93 
Dispersion, measures of, 328-37 
Distributive education, defined, 596 
Double negatives, rox 
Double-entry table, 374-75 
Drake Musical Memory Test, 5509 
Drawing scales, 567-68 
Drawing tests, 568-70 
Drill, 117-18 
Drill tests, 50 
Dunning Physics Test, 541 
Duplicate forms, of test, 82 
Durost-Center Word Mastery Test, 412 
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performances, 


Durrell Analysis of Reading Difficulty, 
404 

Dyer Blackboard Test for Tennis Abil- 
ity, 626 

Dyer-Schurig-Apgar Basketball Test, 
626 


Economy, defined, 8r 
Economy of time, 165 
Educational and mental tests, 37-38 
Educational evaluation, See E aluation 
Educational measurement: first book on, 
24 
Present status of, 33-34 
See also Measurement 
Educational quotient, 345 
Educational testing: from 1800 to 1900, 
22-24 
from 1900, 24-27 
Edueational tests, 38, 44-54 
described, 5-6 
early, 21-22 
general characteristics of, 5-8 
Eight-Year Study, 2; 
Elwell-Fowlkes Bookkeeping Tests, 602, 


Emotional adjustment, measurement of, 
291-96 
Engle-Stenquist Home Economics Test, 
590 
English: aims and outcomes in, 422-29 
aptitude measurement in, 439-40 
as foreign language, 469-70 
educational and social importance of, 
422-24 
remedial instruction in, 440 
Ephraimites, 20 
Equated Scores, 350 
Error of grouping, 312 
Essay examinations, 44, 45 
Essay questions, types of, 151-53 
Essay tests, 42-43, 141-57 
advantages of, 147-50 
conclusions concerning, 150-51 
improving, 151-577 
limitations of, 142-47 
Scoring, 153-55 
Suggestions for improving, 156-57 
Essential High School Content Battery, 
641-42 
Establishing reliability, 103-5 
Establishing validity, 103-5 


Evaluation, 3-4 
in art education, 563-73 
in business education, 595-612 
in expressive language arts, 422-64 
in fine arts, 553-73 
in foreign languages, 465-82 
in health and physical edcuation, 
613-30 
in health education, 613-20, 628-30 
in home economics, 588-94 
in industrial and practical arts, 574- 
94 
in industrial arts, 581-88, 592-94 
in mathematics, 503-29 
in music education, 554-63, 570-73 
in physical education, 621-30 
in receptive language arts, 392-421 
in sciences, 530-52 
in social Studies, 483-502 
meaning of, 218-19 
need for, in education, 1-3 
of general educational achievement, 
631-48 
See also Measurement 
Evaluation tools and techniques, 217-37 
Evaluations, personality, 58-61 
Evaluative instruments, 3 
and techniques, 44 
development of, 27 
Evaluative techniques, 27, 54, 61, 232- 
35, 297-303 
Evaluative tests, 53, 219-25 
Evaluative tools, 27, 54, 225-33 
using, 234-35 
Examination, characteristics of a good, 
65-85 
Examinations, in Boston, 22 
Exercises, matching, 185-86 
constructing, 193-94 
Expectancy, standards of (Table 7), 145 
Expressive language arts, measurement 
and evaluation in, 422-64 
Extensive sampling, 163-64 
effect of (Fig. 10), 164 


Factor analysis, 31 

Factual tests, 489 

Faculty theory, 241 

Fine arts, measurement and evaluation 
in, 553-73 

First Year German Test, 473 

Fischer Mechanical Drawing T. ests, 582 
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Foods, tests concerning, 589 
Footrule Coefficient, 74, 387 
Foreign languages: aptitude and prog- 
nostic tests in the, 477-78 
diagnosis and remediation in the, 479 
in elementary school, 470 
measurement and evaluation in, 465- 
82 
objectives and outcomes of the, 466- 
72 
Franseen Diagnostic Tests in Language, 
437 
Frear-Coxe Clothing Test, 589 
Free association, 281 
Freeman Chart for Diagnosing Faults 
in Handwriting, 457 
French Life and Culture Test, 473 
French-Cooper Volleyball Test, 626 
Frequency polygon, 356-58 
Frequency table, 313 


Gates Reading Survey for Grades 3 10 
I0, 411 
Gates-Strang Health Knowledge Test, 
616 
General achievement, measurement and 
evaluation of, 631-48 
General achievement batteries, types of, 
632 
junior high school, 634-38 
senior high school, 641-44 · 
six-year high school, 638-41 
General achievement . tests, advantages 
and limitations of, 632-34 
General educational development, tests 
of, 644-46 
General Educational 
USAFI Tests of, 646 
General intelligence, 248-49 
individual scales of, 244-48 
General intelligence tests, 55-56, 244-49, 
270-72 
General mathematics: in the junior- 
senior high school, 508-10 
measurement in arithmetic and, 511- 
17 
General science, 534, 539 
Generalised Attitudes, Scales, 285 
Geography tests, 491. 
Geometry: measurement in plane and 
solid, 521-24 
standardized testing in, 523-24 


Development, 
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Gilbert Business Arithmetic Test, боз 
Gileadites, 20 
Glenn-Gruenberg Instructional Tests in 

General Science, 539 
Good Neighbor Policy, 468 
Grade equivalents, 343-44 

Grade norms, 95-96, 363 

Grammar and usage, standardized tests 
in, 436-37 

Graphical representation, 355-63 

Gray-Votaw-Rogers General Achieve- 
ment Tests, 108, 636-37 


Fig. 4, 110 
Greene-Stapp Language Abilities Test, 
437, 438 
Greenwich Astronomical Observatory, 
28 


Greenwich Hospital School, 23 

Group comparisons, 112 

Group data, 309-16 

Group dynamics, evaluation of, 61, 299- 
303 

Group factors, 241 

Group intelligence tests, 55-56 

Group-factor tests, 57, 253-55, 273-74 

Grouping data, 309-16 

Guess Who Test, 293-94 

Guessing, 166-67 


Haggerty-Olson-Wickman Behavior Rat- 
ing Schedules, 294 
excerpt from, 295 
Handwriting: measurement and reme- 
diation in, 449-60 
measurement of, 455-57 
merit scales in, 456-57 
objectives and measurable qualities 
in, 452-53 
Handwriting quality, 453-54 
Handwriting rate, 454-55 
Handwriting scale, California Achieve- 
ment Tests (Fig. 12), 208 
Hayes Scale for Evaluating the School 
Behavior of Pupils, 296 
Health Activities Inventory, 223, 234 
excerpts from, 224 
Health and physical education, meas- 
urement and evaluation in, 613-30 
Health attitudes inventories, 617, 619 
Health Attitudes Inventory, 286, 288, 
617 
Health Awareness Test, 617, 618 


Health education: measurement and 
evaluation in, 613-20, 628-30 
prevention and diagnosis in, 619-20 

scope and aims of, 613-15 
Health evaluation inventories, 617-18 
Health Interests Inventory, 288 
Health inventories, 223-24, 619 
Health knowledge tests, 615-17 
Hiett Simplified Shorthand Test, 600 
Hiett Stenography Test, 600 
Hillegas Composition Scale, 436 
Histogram, 358-59 
History tests, 401 
Hoke Prognostic Test of Stenographic 
Ability, 605-6 
Holley Sentence Vocabulary Scale, 413 
Home economics: informal objective 
testing in, 590-92 
measurement and evaluation in, 588- 
94 
objectives of, 578-81 
standardized tests in, 588-9o 
Home mechanics tests, 583 
Horn Basic Writing Vocabulary, 444 
Household management, tests concern- 
ing, 590 
Hunter Industrial Arts Test, 583 


Illinois Foods Test, 589 
Index of brightness, 263-64 
Index of studiousness, 268 
Individual behavior, evaluation of, бт, 
297-99 
Individual differences, recognition of, 21, 
28 
Individual intelligence scales, 55 
Individual pupil diagnosis, 107-8 
Industrial and practical arts: measure- 
ment and evaluation in, 574-94 
social and educational significance of, 
574-80 
Tndustrial arts: informal objective test- 
ing in, 584 
measurement and evaluation in, 581- | 
88, 592-94 | 
objectives of, 576-78 
Industrial arts tests, 582-83 | 
Industrial education, defined, 574 | 
Informal objective testing, using results, 
194-95 
Informal objective tests, 4, 9, 43, 160-98 
advantages of, 163-65 


Informal objective tests (cont.) 
construction of, 167-73 
development of, 25-26 
possible disadvantages of, 165-67 
using, 173-75 

Instructional outcomes, 68-70 
types of, 168-70 

Instructional tests, 50 

Integral limits, 313-16 

Intelligence, 13 
bi-factor tests of, 253-54 
defined, 239 
early attempts to measure, 29 
group-factor tests of, 253-55, 273-74 
measurement of, 242-49 
multi-factor tests of, 253-55 
nature of, 239-41 
performance tests of, 57-58, 274 

Intelligence and aptitude tests, 238-76 

Intelligence quotient, 258-62, 345 
constancy of, 260-61 
distribution of the, 265-66 
future of the, 262 
social class and the, 261-62 

Intelligence testing, 13 
derived results of, 257-64 
from 1800 to 1900, 28-29 
from 1900 to the present, 29-31 
procedures for, 268-70 

Intelligence tests, 38, 54-58, 584-85 
administering and scoring, 269 
factored, 31 
first individual, 29-30 
group, 30 
individual, in America, 30 
specific, 31, 273 
uses of, 269-74 

Interest-Attitude Test (Pressey), 288 

Interests, 168, 538 
defined, 287 
informal measurement of, 280 
measurement of, 287-90 

Interests inventories, 59, 287-90 

Interests measurement, 287-90 

International Test Scoring Machine, 131 
Fig. 6, 132 

Interpretation of Data Test, 220-21, 

235 * 
excerpt from, 221 

Interpreting test results, 339-70 

Interpretive tests, 53-54, 220-23 

Inter-Trait Rating Scale, 303 

Interview, 233-34, 287 
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Inventories: adjustment, 292-96 
health attitudes, 617 
health evaluation, 617-18 
interests, 287-90 
personality, 58-61 
Inventory of Personal-Social Relation- 
ships, 224 
excerpt from, 225 
Inventory tests, 46, 47 
Iowa Algebra Aptitude Test, 250, 525, 
526 
excerpt from, 251 
Towa Basic-Skills Tests, 516 
lowa-Brace Test, 623 
Iowa Every-Pupil Tests of Basic Skills, 
512, 513, 637-38 
cutout scoring stencil, Fig. 5, 130 
Language, 40, 98, тоо 
excerpts from, 39 
Table 4, 99 
Table 5, 101 
Iowa General Information Test in 
American History, 493 
Iowa Grammar Information Test, 437 
Towa High School Content Examina- 
tion, 641-43 Ё 
Towa Language Abilities’ Test, 95-96, 
437, 440, 449 
Table 3, 97 
Iowa Placement Examinations, 250, 
439, 440, 478, 525, 549 
Towa Plane Geometry Aptitude Test, 
526 
lowa Revision of the Brace Test of 
Motor Ability, 622 
Iowa Silent Reading Tests, 405, 407, 


412 
Iowa Spelling Scales, 51, 444, 445 
lowa Tests of Educational Develop- 

ment, 644-45 
Items: alternate-response, 180-82, 191- 

92 

completion, 178-80 
constructing, 189-93 
multiple-choice, 182-84, 192-93 
multiple-response, 183-84 
recall-type, 189-91 

simple recall, 177-78 


Johnson Test for high-school boys, 


626 
Junior high school, general achievement 


batteries, 634-38 
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Kelley-Greene Reading Comprehension 
Test, 406-7 

Kellogg-Morton Revised Beta Exami- 
nalion, 257 

Kilander Health Knowledge Test, 617 

King-Clark Foods Test, 589 

Kinney Scales of Problems in Commer- 
cial Arithmetic, 603 

Kirby Grammar Test, 436 

Kirkpatrick Chemistry Test, 541 

Kline-Carey Drawing Scales, 568 

Knauber Test of Art Ability, 569 

Knowledge and information tests, in 
physical education, 624-26 

Knowledges, 168, 169, 537-38 

Kuhlmann-Anderson Intelligence Tests, 
248, 263 

excerpts from, 249 

Kwalwasser-Dykema Music Tests, 558 

Kwalwasser-Ruch Test of Musical Ac- 
complishment, 561 

Kwalwasser Test of Music Information 
and Appreciátion, 561, 563 


Lane-Greene Unit Tests in Plane Ge- 
ometry, 523, 524 
Language: Jowa Basic Skills Tests, 98 
Table 4, 99 
Table 5, тог 
measurement and diagnosis of, 429- 
38 
Language abilities, analytical measure- 
ment of, 437-38 
Language Aptitude Test, 478 
Language arts: expressive, measure- 
ment.and evaluation, in, 422-64 
receptive, measurement and evalua- 
tion in, 392-421 
Language disabilities, diagnosis of oral, 
429-32 
Language scales, oral, 429 
Language skills: analysis of, 424-28 
oral, 428 
written, 428-29 
Lankton First-Year Algebra Test, 518, 
519 
Larson-Greene Unit Tests in First-Year 
Algebra, 519, 520 
Latin: measurement in, 475-77 
objectives of, 470-72 
standardized tests in, 476-77 
Learning, measuring efficiency of, 112 


Lee Test of Algebraic Ability, 525 

Lee Test of Geometric Aptitude, 526 

Letter marks, 352-55 

Lewerenz Test in Fundamental Abili- 
ties of Visual | Arts, 569, 570, 
581 

Limited sampling, 77, 142-43 

Listening and reading: factors affecting, 


399-404 

significance of, 303-94 

Listening comprehension, measuring, 
402-3 

Listening efficiency, factors in, 399- 
400 


Literary acquaintance and comprehen- 
sion, tests of, 413 
Literary appreciation, tests of, 413 
Literature, 394-95 
measurement in, 413 
Logasa-Wright Tests for the Apprecia- 
tion of Literature, 413 
Logical Reasoning Test, 222, 235 
excerpt from; 223 
Logical validity, 72 
Luria-Orleans Modern Language Prog- 
nosis Test, 478 


MacQuarrie Test of Mechanical Ability, 
586 
Maladjustment, 291 
Manual arts, 575 
Markham English Vocabulary Test, 413 
Master Achievement Tests, 634 
Mastery tests, 5o 
Matching exercises, 185-86, 475, 495-96, 
542-43, 625-26 
constructing, 193-94 d 
Mathematics: aptitude and prognostic 
tests in, 525-26 
general significance of, 503-8 
measurement and evaluation in, 503- 
29 
objectives and outcomes of, 505-8 . 
McAdory Art Test, 568, 581 
McCall Inter-Trait Rating Scale, 304 
McCauley Examination in Public 
School Music, 561 , 
Measurement: and the total child, 14- 
15 
development of, 19-36 
in algebra, 517-20 
in art education, 563-73 


Measurement (cont.) 
in business education, 595-612 
in expressive language arts, 422-64 
in fine arts, 553-73 
in foreign languages, 465-82 
in geometry, 521-24 
in handwriting, 449-60 
in health and physical 
613-30 
in health education, 613-20, 628-30 
in home economics, 588-94 
in industrial and practical arts, 574- 
94 
in industrial arts, 581-88, 592-94 
in language, 429-38 
in Latin, 475-77 
in literature, 413 
in mathematics, 503-29 
їп modern languages, 472-75 
in music education, 554-63, 570-73 
in physical education, 621-30 
in receptive language arts, 392-421 
in sciences, 530-52 
in social studies, 483-502 
in spelling, 441-49 
need for in education, 1-3 
of art abilities and achievement, 567- 
7o 
of general educational achievement, 
631-48 
of handwriting, 455-57 
of language abilities, 437-38 
of music appreciation, 563 
of musical achievement, 560-63 
of musical knowledge, 560-62 
of musical memory, $59-60 
of musical skills, 562-63 
of musical talent, 557-60 
of scientific attitude, 546-48 
of superstitious beliefs, 548-49 
of vocabulary, 412-13 
of work-study reading, 404-11 
of written composition, 436 
standard error of, 75, 387-88 
to 1800, 20-22 
See also Evaluation 
Measuring, 3 
Measuring 
402-3 
Measuring techniques, 210-13 
Mechanical aptitude tests, 585-86 
Mechanical drawing tests, 582 
Median, defined, 324 


education, 


listening comprehension, 
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Median of grouped data, computing 


the, 324 

Meier Art Judgment Test, 568, 569, 
582 

Meier-Seashore Art Judgment Test, 
568 


Mental age, 257-58 

Mental growth, 259 

Mental measurement, present status of, 
33-34 

Mental Measurements Yearbooks, 123, 
125 

Merit scales in handwriting, 456-57 

Merrill-Palmer Scale of Mental Tests, 
264 

Metronoscope, 404 

Metropolitan Achievement Tests, 229, 
516, 539, 635-36 

Fig. 16, 227 

Metropolitan Literature Test, excerpt 
from, 186 

Metropolitan Readiness Test, 252 

Metropolitan Reading Test, excerpt 
from, 180 

Metropolitan Social Studies Test, 496 

Michigan Speed of Reading Test, 411 

Michigan Vocabulary Profile Test, 


412 
Middleton Industrial Arts Test, 582 
Mid-measure, 323 
Midpoint, 313-16 
Minnesota Assembly Test, 586 
Minnesota Card Sorting Test, 586 
Minnesota Check List for Food Prepa- 
ration and Serving, 589 
Minnesota House Design and House 
Furnishing Test, $90 
Minnesota Interest Analysis Test, 586 
Minnesota Mechanical Ability Tests, 
586 
Minnesota Packing Blocks Test, 586 
Minnesota Paper Form Board Test, 586 
Minnesota Rate of Manipulation Test, 
608 
Minnesota Spatial Relations Test, 586 
Minnesota Vocational Test for Clerical 
Workers, 608 
Modern foreign languages: measurable 
elements in the, 472 
measurement in the, 472-75 
outcomes of the, 466-67 
standardized tests in the, 472-75 
trends in the, 467-70 


684 INDEX OF SUBJECTS 


Modern School Achievement Tests, 634 
Mooney Problem Check List, 294 
Multi-factor tests, 57, 253-55, 274 
Multi-factor theory, 241 
Multiple-attribute tests, 4r 
Multiple-choice items, 182-84, 473-75, 
494-95, 511-12, 512-13, 541-42, 625 
constructing, 192-93 
Multiple-response items, 183 
Murdock Sewing Scale, 589 
Murphy-Durrell Diagnostic 
Readiness Test, 403 
Music, measurable qualities in, 554-57 
Music appreciation, measurement of, 


Reading 


563 
Music education: aims and outcomes 
of, 554-57 
measurement and evaluation in, 554- 
63, 570-73 


Musical achievement, measurement of, 
560-63 

Musical Achievement Test, 562 

Musical knowledge, measurement of, 
560-62 

Musical memory, measurement of, 559- 
бо 

Musical skills, measurement of, 562-63 

Musical talent, 554 

measurement of, 557-60 

Myers-Ruch High School Progress Test, 

643 


Nash-Van Duzee Industrial Arts Test, 
582 

National Achievement American History 
Test, excerpt from, 178 

National Achievement Tests, 634 

National Business Entrance Tests, 605, 
608-9 

National Clerical Abilities Tests, 608 

Navy General Classification Test, 248 

Nelson Biology Test, 539 

Nelson-Denny Reading Test, 411 

New Revised Stanford-Binet Tests of 
Intelligence, 245-48, 264 

New York Latin Achievement Test, 476 

Newkirk-Stoddard Home Mechanics 
Test, 583 

Non-test tools, 43-44 

Non-verbal tests, 42 

Norms, 82, 342-43, 363-67 

deriving test, 94-102 


reliability of, 104-5 
types of, 95-102 
Norms уз. standards, 102-3 


Object tests, 52, 201, 202-4 
Objective examinations and scales, 44, 
45-51 
Objective test items, constructing, 186- 
94 
Objective tests, 38, 43 
early, 23-24 
functions of, 6-8 
standardized үз. 
161-62 
Objectives: in algebra, 517-18 
in business education, 505-97 
in handwriting, 452-53 
in reading and listening, 395-99 
of foreign languages, 466-72 
of home economics, 578-81 
of industrial arts, 576-78 
of mathematics, 505-8 
of physical education, 620-21 
of plane geometry, 521-22 
of sciences, 531-33 
of social studies, 484-85 
of solid geometry, 522-23 
Objectivity, 4, 77-79, 89, 389 
coefficient of, 79 
defined, 77 
determination of test, 380 
Observational methods, 283 
Ophthalmograph, 404 
Oral examinations, 44-45 
early, 20 
Oral language scales, 429 
Oral language skills, 428 
Oral reading, remedial drills for, 414- 
15 
Oral tests, 42, 138-41 
advantages of, 140-41 
limitations of, 140 
Orleans Algebra Prognosis Test, 525-26 
Orleans Geometry Prognosis Test, 525 
Orleans-Solomon Latin Prognosis Test, 
478 
O'Rourke Clerical Aptitude Test, 608 
O’Rourke Mechanical Aptitude Test, 
586 
Otis Classification Test, 644 
Otis Quick-Scoring Group Tests of 
Mental Ability, 264 


non-standardized, 


te a us 


Outcomes: in English, 422-29 
of art education, 564-67 
of modern foreign languages, 466- 
67 
of music education, 554-57 
of sciences, 533-34 
of social studies, 486-88 


Parke Commercial Law Test, 599 
Pearson product-moment coefficient of 
correlation, 372-81 
Per cent of average development, 263 
Percentile, defined, 347 
Percentile grade norms, 99-100 
Percentile norms, 363 
for school averages, 100-102 
Percentile ranks, 346-47, 362 
defined, 347 
Percentile scores, 264 
Percentiles, 347-48, 362 
Performance, defined, 340 
Performance measures, 201, 204-7 
Performance testing, using results of, 
215 
Performance tests, 42, 52-54, 57-58, 
199-216 
and scales, 44 
constructing, 213-14 
nature of, 200-201 
of intelligence, 274 
and aptitude, 256-57 
Performance tests in silent reading, 
403-4 
Personal constant, 263 
Personal Data Sheet, 32 
Personal reports, 281, 284 
blanks, 292-94 
Personality: defined, 280 
measurement of total, 303-5 
nature of, 279-81 
Personality evaluation, 58-6r 
1800 to the present, 32-33 
Personality instruments and techniques, 
278-305 
Personality inventories, 32, 58-61 
Personality measurement: techniques of, 
281-84 
tools of, 284-96 
Personality quotient, 303-5 
Personality testing, 13 
Personality tests, 38 
antecedents of modern, 32 
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Personality types, classification of, 
21 
Physical classification tests, 626-27 
Physical education: diagnosis in, 627-28 
measurement and evaluation in, 621- 
30 
objectives of, 620-21 
Physical examinations, 618-19 
Physical qualities, tests of, 622-23 
Physics, 534, 539-40 
Pintner General Ability Tests, 253 
excerpts from, 56 
Table 23, 364 
Pintner-Paterson “Long” Performance 
Scale, 257 
Fig. 19, 256 
Plane geometry: objectives of, 521-22 
prediction of success in, 526 
Posture tests, 623 
Power test, 40 
Practical arts, 576 
Practicality, 79-81 
Practice exercises, 50, 117 
Practice tests, 50 
Practices and activities, tests of, 223-25 
Prediction, significance of correlation 
coefficient for, 382-84 
Pressey Interest-Attitude Test, 59, 
288 
excerpts from, 60 
Preventive work, 116 
Primary mental abilities tests, 274 
Primitive tribes, 21 
Problem-solving: meaning of, 514 
standardized testing in, 512-13 
Problem-solving exercises, 514-16 
Problem-solving tests, 490 
Product evaluation, 201, 207-13 
Product moments, 377-79 
Product scales, 51 
Profile chart, 225-26 
for California Arithmetic Test (Fig. 
15), 226 
Prognostic Test of Mechanical Abilities, 
47, 202 
excerpts from, 203 
Prognostic tests, 46, 47 
Progress chart, 226-27, 229 
Progressive Achievement Tests, 638 
Progressive Tests in Social and Related 
Sciences, 634 
excerpts from, 182 
Projective method, 32-33, 61 
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Projective techniques, 297, 208-99 

Psychological and logical validity, 68 

Psychological examinations, 38 

Psychological validity, 72 

Pupil gradation and placement, rri- 
12 

Pupil record, cumulative, 228-29 


Quality scale, 52, 207 
Quartiles, 347 
Questionnaire, 234 


Questions, discussion, short answer, 
simple-recall, 153 

Quintiles, 347 

Quizzes, so 

Quotient: accomplishment, 267-68, 
346 


achievement, 267-68, 346 
educational, 345 
intelligence, 258-62, 345 
personality, 303-5 
reading, 345 


Range, 312, 313 
defined, 330 
Rapport, 245 
Rating Form for Fastening, excerpt 
from, 209 j 
Rating scales, 52-53, 207-8, 281, 283-84, 
294-96 
Read General Science Test, $39, s42 
excerpt from, 184 
Readability, 401-2 
Readiness tests, 57, 251-52, 273 
Reading, 393-94 
and listening, objectives in, 395-99 
Corrective exercises in, 414-17 
defects in, 400 
oral vs. silent, дот 
remedial materials in, 417 
Reading quotient, 345 
Real limits, 313-16 
Rearrangement items, 583 
Recall items, 46 
constructing, 189-91 
Receptive language arts, measurement 
and evaluation in, 392-421 
Receptive language skills, listening and 
reading as, 393-99 
Recognition items, 46 
Relationships, measures of, 371-9r 


Relative ranks, 351-32 
Reliability, 4, 72-79, 385-88 
defined, 72 
establishing, 103-5 
evaluation of test, 385-88 
Reliability coefficient, 73, 385 
Remedial drill, 117 
Remedial teaching, 12-14 
Report card, pupil, 229 
Response, uniformity of, 89 
Retesting coefficient, 73, 385 
Rigg Poetry Judgment Test, 413 
Rogers Personality Test, 292 
Rollinson Diagnostic Shorthand Tests, 
600-601 
Rorschach test, 32, 298 
Rounding numbers, 315 
Russell-Lange Volleyball Test, 626 


Salesmanship aptitude tests, 608-9 
Sampling: intensive, 142-43 
principle of (Fig. 2), 76 
Satterfield Objective Tests in English, 
413 
Scale books, 23-24 
Scale for Handwriting of Children, 25 
Scale for Measuring the Handwriting of 
School Children (Ayres), 103 
Scaled scores, 350 
Scaled tests, 38-40 
Scales, 38-39 
attitudes, 285 
Scales for the Measurement of Social 
Attitudes (Thurstone), 285 
Scaling, 39 
Scatter diagram, 374-75 
Schrammel-Gray High School and Col- 
lege Reading Test, 411 
Science: outcomes of, 537-38 
principles and generalizations ОЁ, 
535-30 kia 
Sciences: aptitude testing in, 549 
diagnosis and remediation in, 549-50 
informal objective testing in, 544-49 
measurement and evaluation in, 530- 
52 
objectives of, 531-33 
outcomes of, 533-34 
scope of, 531-36 | 
standardized tests in, 538-43 
Scorability, 80-81 
Score card, 52-53, 207, 209 


Score Card for Waffles, 209 
Scores: composite, 353-55 
derived, 257-64 
percentile, 264 
standard, 264 
Scoreze, 131 
Scoring: objectivity of, 165 
subjectivity of, 143-45 
Scoring machines, 131 
Seashore English Recognition Vocabu- 
lary Test, 413 
Seashore Measures of Musical Talent, 
557-58 
Seashore-Bennett — Stenographic 
ficiency Tests, бот 
Seattle Algebra Test, 518, 519 
Seattle Plane Geometry Test, 523 
Senior high school, general achievement 
batteries, 641-44 
Sentence Vocabulary Scale, 413 
Shaycoft Plane Geometry Test, 523 
Shemwell-W hitcraft Bookkeeping Tests, 
603 
Shorthand, prognostic 
tests in, 605-6 


Pro- 


and aptitude 


Silent reading: analysis and diagnosis, 


in, 404-13 
performance tests in, 403-4 
Simmons-Bixler Standard High School 
Spelling Scales, 445 
Simple recall items, 177-78, 473, 493; 
511, 512, 540 
Six-year high school, general achieve- 
ment batteries, 638-41 
science, 538 
written expression, 432-35 
Skills, 168, 169 
Smith General Business Training Test, 
598 
Smith-Bixler Awareness Test in 20th 
Century Literature, 413 
Snader General Mathematics Test, 517 
Social education, defined, 483-84 
Social learning, defined, 483 
Social sciences, defined, 483 
Social studies: diagnosis and remedia- 
tion in, 499-500 
evaluation in, 497-99 
informal objective tests in, 496-97 
interpretive tests in, 492 
kinds of tests in, 489-90 
measurement and evaluation in, 483- 


502 
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Objectives of, 484-85 
organization of, 485-86 
outcomes of, 486-88 
scope of, 484-86 
standardized tests in, 489-96 
Sociogram, бт, 299-301 
Fig. 21, 301 
Sociometric method, 299-303 
Solid geometry, objectives of, 522-23 
Source scales, 51 
Spartans, 21 
Spearman-Brown Prophecy Formula, 
74, 386 
Specialized achievement batteries, 644 
Specific determiners, 9o, 191 
Specific intelligence tests, 56-57, 250-52 
Speed test, до 
Spelling: measurable qualities in, 441 
measurement апа remediation іп, 
441-49 
remedial work in, 449 
social and educational significance of, 
441 j 
Spelling disabilities, diagnosis 
remediation of, 447-48 
Spelling tests, construction of, 444-47 
Spitzer Study Skills Tests, 407, 408- 
10 
Sports, tests of proficiency in, 626 
SRA Dictation Skills test, 601 
SRA Primary Mental Abilities Tests, 
253 
Standard deviation, 330-37 
defined, 330 
derived scores based on the, 348-50 
of grouped data, computing the, 333- 


and 


37 
of ungrouped data, computing (һе, 
332-33 
Standard error of measurement, 75, 
387-88 
Standard measures, 340 
Standard scores, 264, 350 
Standardization, meaning of, 87 
Standardized achievement tests: first, 
25 
later development of, 26-27 
Standardized educational tests, 4, 9-12 
Standardized tests, 42, 43, 86-137 
administration of, 11 
administrative uses of, 109-12 
constructing, 86-105 
guidance: uses of, тоо 
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Standardized tests (cont.) 
instructional uses of, 105-8 
practical uses of, 105-19 
scoring of, 11-12 
selection of, тт 
Stanford Achievement 
516, 539 
Stanford Achievement Tests, 437, 634- 
35 
Stanford Revision of the Binet Scale, 30, 
246 
Statistical methods, 308-89 
. foundations of, 28 
Statistical validity, 68, 70-72 
Stenography, standardized tests in, 600- 
601 
Stenquist Assembling Tests, 586 
Stenquist Mechanical Aptitude Tests, 
586 
Stevenson-Trilling Tests in Compre- 
hension of Patterns, 589, 590 
Streeter-Trilling Food Preparation Test, 
589 
Strong Vocational Interest Blank, 288 
Studiousness, index of, 268 
Subject norms, 99-100 
Summarizing test results, 308-38 
Superstitious beliefs, measurement of, 
548-49 

Survey tests, 46 

Symonds Foreign Language Prognosis 
Test, 478 

Szondi Test, 298 


Examinations, 


Tabulation of test scores, 309-16 
Tastes and preferences, 168 
Tate Economic Geography Test, 508, 
599 
Teacher-made tests, 9, 42-43 
Teacher's Handbook of Technical Vo- 
cabulary, 412 
Technical business education, defined, 
596 
Techniques, 43-44 
evaluative, 232-35, 297 
projective, 298-99 
verbal association, 282 
visual stimulus, 282 
“Telling What I Do” tests (Baker), 
295 
Terman Group Test of Mental Ability, 
107 


Test, 38 
reliability of, 104 
validity of, 104 
Test check list, 125 
"Test content, validity of, 87-88 
Test exercise: in foods, 591 
in textiles, 592 
Test for Ability To Sell, 608 
Test forms, equating, 93-94 
Test items, constructing and validating, 
88-93 
Test norms, 363-67 
Test rating scales, 124-25 
Test results, analyzing and interpreting, 
12, 134 
Test score: defined, 340 
meaning of a, 315, 330-42 
tabulation of, 309-16 
Test Score Card, 157 
Testing, 3 


Testing programs: nation-wide, 123 


planning, 119-23 

state-wide, 122-23 

steps in, 119 
Tests, 38-40, 43-44 

administering, 126-28 

aptitude, 250-51, 273 

bi-factor, 273-74 

essay, 141-57 

evaluative, 219-25 

first in the school, 22 

general classification of, 37-44 

Eeneral intelligence, 244-49, 270-72 

hand-scored, 129-31 

in algebra, 518-20 

in bookkeeping, 602-3 

in business education, 597-609 
content areas, 598-99 
commercial arithmetic, 603 
geometry, 523-24 
grammar and usage, 436-37 
home economics, 588-92 
home mechanics, 583 
industrial arts, 582-83, 584 
Latin, 476-77 
modern languages, 472-75 
Sciences, 538-43, 544-49 
Silent reading, 403-4 
social studies, 489-97 
stenography, 600-601 
trigonometry, 525 
typewriting, 601-2 
informal objective, 160-98 
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Tests (cont.) 
intelligence and aptitude, 238-76 
interpretive, 220-23 
inventory, 46, 47 
machine-scored, 131 
multi-factor, 274 
non-verbal, 41-42 
of art abilities, 569-70 
of art appreciation, 568-69 
of clerical aptitudes, 606-7 
of computational skills, 511-12 
of general educational development, 
644-46 
of health knowledge, 615-17 
of literary acquaintance and compre- 
hension, 413 
of literary appreciation, 413 
of mechanical drawing, 582 
of physical qualities, 622-23 
of practices and activities, 54, 223-25 
of problem-solving, 512-13 
of proficiency in sports, 626 
oral, 138-41 
performance, 41-42, 199-216 
power, 4o 
prognostic, 46, 47 
readiness, 251-52, 273 
Scoring, 128-33 
selecting, 123-26 
self-scoring, 131 
significance of, 8-9 
specific intelligence, 250-52, 273 
speed, 4o 
standardized, 42, 43, 86-137 
survey, 46 
teacher-made, 42-43 
types of, 8-12, 37-64 
verbal, 41-42 
when to give, 121-22 
Textiles, tests concerning, 589-90 
Thematic Apperception Test, 298 
Thompson Business Practice Test, 598, 
599 
Thorndike Extension to the Hillegas 
Scale for the Measurement of Qual- 
ity in English Composition by 
Young People, 436 
Thorndike Scale, handwriting, 456 
Thorndike Teacher’s Word Book, 444 
Thurstone Clerical Ability Test, 606-7 
Thurstone Scales for the Measurement 
of Social Attitudes, 285 
Time-limit test, 41 
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Timing devices, 206-7 
Tools, evaluative, 225-33 
Total child, measurement and the, 14-15 
Trade tests, 586-88, 607-8 
Traxler High School Reading Test, 411 
Traxler Silent Reading Test, 411 
Trigonometry: measurement in, 524-25 
objectives of, 524 
standardized testing in, 525 
True-false items, 541, 624 
T-scores, 349-50 
Turse Shorthand Aptitude Test, 606 
Turse-Durost Shorthand Achievement 
Test, 600 
Two-factor theory, 241 
Typewriting, standardized tests in, 601-2 


Understandings, 168, 169-70, 538 

Unit Scales of Aptitude, 439, 440 

Unit Scales of Attainment in Foods and 
Household Management, 590 

Unit Scales of Attainment in Language, 
437 

Unit Scales of Attainment in Reading, 
411 

United States Armed Forces Institute 
Tests of General Educational De- 
velopment, 646 

University of Bologna, 22 

University of Paris, 22 

Using the informal objective test, 173- 
75 

Utility, 82-83 


Validity, 4, 66-72, 385 
coefficient of, 7o 
curricular, 68-70 
defined, 66 
determination of test, 385 
establishing, 103-5 
Van Wagenen General Science Reading 
Scales, 539 
Van Wagenen Reading Scales in Litera- 
ture, 413 
Variability, measures of, 328-37 
Verbal association techniques, 282 
Verbal tests, 41-42 
Visual arts tests, 581-82 
Visual stimulus techniques, 282 
Vocabulary, measurement of, 412-13 
Vocational tests, 586-88 
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Watson-Glaser Test of Critical Think- 
ing, 221 
excerpts from, 222 
Wells Knowledge Test in Machine 
Shop, 583 
ells Knowledge Test in Printing, 


583 
Wells Knowledge Test in Woodworking, 
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Wells-Laubach Industrial Arts Test, 
583 j 

ells-Laubach Knowledge Test of Me- 

chanical be | 582 

Whole child, 3 

Wilkins е 100 Test in Modern 
Languages, 478 _ 

Willing Scale jor Measuring Written 
Composition, 436. 
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Winnetka Graded Book List, 402 

Words, systematic sampling of, 444 

Work-limit test, 40 

Work-study reading: measurement of, 
404-11 

remedial drills for, 416-17 

Wright Achievement Test in Mechanical 
Drawing, 582 

Writing, handedness as factor in, 459-60 

Written composition, measures of gen- 
eral merit of, 436 

Written examinations, early, 20 

Written expression, skills of, 432-35 

Written language skills, 428-29 

Written quiz, 45 


Z-scores, 349 


